On Tue, Jan 8, 2013 at 1:11 PM, Yann ROBIN <yann.robin@xxxxxxxxxxxxx> wrote: > We lost data in notify and gc. What bother me is that the rados gateway can start if we desactivate the cache. > I think the availability of the cache objects shouldn't take down the rados gateway. The option should be more a "I want the cache if available". But what if some of the instances can access the objects and others can't? Then you've got daemons caching data and the others aren't notifying them. This pretty much needs to be a manual switch, as far as I can imagine it working. Unless somebody else has ideas on improving it? -Greg > > -----Message d'origine----- > De : Gregory Farnum [mailto:greg@xxxxxxxxxxx] > Envoyé : mardi 8 janvier 2013 18:03 > À : Yann ROBIN > Cc : ceph-devel@xxxxxxxxxxxxxxx > Objet : Re: Rados gateway init timeout with cache > > To clarify, you lost the data on half of your OSDs? And it sounds like they weren't in separate CRUSH failure domains? > > Given that, yep, you've lost some data. :( > > On Tue, Jan 8, 2013 at 5:41 AM, Yann ROBIN <yann.robin@xxxxxxxxxxxxx> wrote: >> Notify and gc objects where unfound, we marked them as lost and now the rados start. >> But this means that if some notify object are not fully available, the radosgateway stop responding. > > Yes, that's the case. I'm not sure there's a way around it that makes much sense and satisfies the necessary guarantees, though. > -Greg > >> -----Original Message----- >> From: ceph-devel-owner@xxxxxxxxxxxxxxx >> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Yann ROBIN >> Sent: mardi 8 janvier 2013 12:13 >> To: ceph-devel@xxxxxxxxxxxxxxx >> Subject: Rados gateway init timeout with cache >> >> Hi, >> >> We recently experienced issue with the backplane of our server, resulting in loosing half of our osd. >> During that period the rados gateway failed initializing (timeout). >> We found that the gateway was hanging in the init_watch function. >> >> We recreate our OSDs and we still have this issue, but pg are not all in an active+clean state : >> health HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 2 pgs recovery_wait; 3 pgs stuck unclean; recovery 7/10140464 degraded (0.000%); 3/5070232 unfound (0.000%); noout flag(s) set >> monmap e2: 3 mons at {ceph-mon-1=172.20.1.13:6789/0,ceph-mon-2=172.20.2.13:6789/0,ceph-mon-3=172.17.9.20:6789/0}, election epoch 256, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3 >> osdmap e4439: 6 osds: 6 up, 6 in >> pgmap v2531184: 11024 pgs: 11019 active+clean, 2 active+recovery_wait, 1 active+recovering+degraded+remapped, 2 active+clean+scrubbing+deep; 1291 GB data, 2612 GB used, 19645 GB / 22257 GB avail; 7/10140464 degraded (0.000%); 3/5070232 unfound (0.000%) >> mdsmap e1: 0/0/1 up >> >> Should we open an ticket for this init issue with rados gateway ? >> Version is 0.56.1 upgraded from 0.55. >> >> -- >> Yann ROBIN >> YouScribe >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo >> info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html