RE: Rados gateway init timeout with cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We lost data in notify and gc. What bother me is that the rados gateway can start if we desactivate the cache.
I think the availability of the cache objects shouldn't take down the rados gateway. The option should be more a "I want the cache if available".

-----Message d'origine-----
De : Gregory Farnum [mailto:greg@xxxxxxxxxxx] 
Envoyé : mardi 8 janvier 2013 18:03
À : Yann ROBIN
Cc : ceph-devel@xxxxxxxxxxxxxxx
Objet : Re: Rados gateway init timeout with cache

To clarify, you lost the data on half of your OSDs? And it sounds like they weren't in separate CRUSH failure domains?

Given that, yep, you've lost some data. :(

On Tue, Jan 8, 2013 at 5:41 AM, Yann ROBIN <yann.robin@xxxxxxxxxxxxx> wrote:
> Notify and gc objects where unfound, we marked them as lost and now the rados start.
> But this means that if some notify object are not fully available, the radosgateway stop responding.

Yes, that's the case. I'm not sure there's a way around it that makes much sense and satisfies the necessary guarantees, though.
-Greg

> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx 
> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Yann ROBIN
> Sent: mardi 8 janvier 2013 12:13
> To: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Rados gateway init timeout with cache
>
> Hi,
>
> We recently experienced issue with the backplane of our server, resulting in loosing half of our osd.
> During that period the rados gateway failed initializing (timeout).
> We found that the gateway was hanging in the init_watch function.
>
> We recreate our OSDs and we still have this issue, but pg are not all in an active+clean state :
>    health HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 2 pgs recovery_wait; 3 pgs stuck unclean; recovery 7/10140464 degraded (0.000%); 3/5070232 unfound (0.000%); noout flag(s) set
>    monmap e2: 3 mons at {ceph-mon-1=172.20.1.13:6789/0,ceph-mon-2=172.20.2.13:6789/0,ceph-mon-3=172.17.9.20:6789/0}, election epoch 256, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
>    osdmap e4439: 6 osds: 6 up, 6 in
>     pgmap v2531184: 11024 pgs: 11019 active+clean, 2 active+recovery_wait, 1 active+recovering+degraded+remapped, 2 active+clean+scrubbing+deep; 1291 GB data, 2612 GB used, 19645 GB / 22257 GB avail; 7/10140464 degraded (0.000%); 3/5070232 unfound (0.000%)
>    mdsmap e1: 0/0/1 up
>
> Should we open an ticket for this init issue with rados gateway ?
> Version is 0.56.1 upgraded from 0.55.
>
> --
> Yann ROBIN
> YouScribe
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux