http://tracker.ceph.com/issues/16113 I think I found the bug. Thanks for the report! Turning off sortbitwise should be an ok workaround for the moment. -Sam On Wed, Jun 1, 2016 at 3:00 PM, Diego Castro <diego.castro@xxxxxxxxxxxxxx> wrote: > Yes, it was created as Hammer. > I haven't faced any issues on the upgrade (despite the well know systemd), > and after that the cluster didn't show any suspicious behavior. > > > --- > Diego Castro / The CloudFather > GetupCloud.com - Eliminamos a Gravidade > > 2016-06-01 18:57 GMT-03:00 Samuel Just <sjust@xxxxxxxxxx>: >> >> Was this cluster upgraded to jewel? If so, at what version did it start? >> -Sam >> >> On Wed, Jun 1, 2016 at 1:48 PM, Diego Castro >> <diego.castro@xxxxxxxxxxxxxx> wrote: >> > Hello Samuel, i'm bit afraid of restarting my osd's again, i'll wait >> > until >> > the weekend to push the config. >> > BTW, i just unset sortbitwise flag. >> > >> > >> > --- >> > Diego Castro / The CloudFather >> > GetupCloud.com - Eliminamos a Gravidade >> > >> > 2016-06-01 13:39 GMT-03:00 Samuel Just <sjust@xxxxxxxxxx>: >> >> >> >> Can either of you reproduce with logs? That would make it a lot >> >> easier to track down if it's a bug. I'd want >> >> >> >> debug osd = 20 >> >> debug ms = 1 >> >> debug filestore = 20 >> >> >> >> On all of the osds for a particular pg from when it is clean until it >> >> develops an unfound object. >> >> -Sam >> >> >> >> On Wed, Jun 1, 2016 at 5:36 AM, Diego Castro >> >> <diego.castro@xxxxxxxxxxxxxx> wrote: >> >> > Hello Uwe, i also have sortbitwise flag enable and i have the exactly >> >> > behavior of yours. >> >> > Perhaps this is also the root of my issues, does anybody knows if is >> >> > safe to >> >> > disable it? >> >> > >> >> > >> >> > --- >> >> > Diego Castro / The CloudFather >> >> > GetupCloud.com - Eliminamos a Gravidade >> >> > >> >> > 2016-06-01 7:17 GMT-03:00 Uwe Mesecke <uwe@xxxxxxxxxxx>: >> >> >> >> >> >> >> >> >> > Am 01.06.2016 um 10:25 schrieb Diego Castro >> >> >> > <diego.castro@xxxxxxxxxxxxxx>: >> >> >> > >> >> >> > Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon. >> >> >> > Today my cluster suddenly went unhealth with lots of stuck pg's >> >> >> > due >> >> >> > unfound objects, no disks failures nor node crashes, it just went >> >> >> > bad. >> >> >> > >> >> >> > I managed to put the cluster on health state again by marking lost >> >> >> > objects to delete "ceph pg <id> mark_unfound_lost delete". >> >> >> > Regarding the fact that i have no idea why the cluster gone bad, i >> >> >> > realized restarting the osd' daemons to unlock stuck clients put >> >> >> > the >> >> >> > cluster >> >> >> > on unhealth and pg gone stuck again due unfound objects. >> >> >> > >> >> >> > Does anyone have this issue? >> >> >> >> >> >> Hi, >> >> >> >> >> >> I also ran into that problem after upgrading to jewel. In my case I >> >> >> was >> >> >> able to somewhat correlate this behavior with setting the >> >> >> sortbitwise >> >> >> flag >> >> >> after the upgrade. When the flag is set, after some time these >> >> >> unfound >> >> >> objects are popping up. Restarting osds just makes it worse and/or >> >> >> makes >> >> >> these problems appear faster. When looking at the missing objects I >> >> >> can >> >> >> see >> >> >> that sometimes even region or zone configuration objects for radosgw >> >> >> are >> >> >> missing which I know are there because the radosgw was using these >> >> >> just >> >> >> before. >> >> >> >> >> >> After unsetting the sortbitwise flag, the PGs go back to normal, all >> >> >> previously unfound objects are found and the cluster becomes healthy >> >> >> again. >> >> >> >> >> >> Of course I’m not sure whether this is the real root of the problem >> >> >> or >> >> >> just a coincidence but I can reproduce this behavior every time. >> >> >> >> >> >> So for now the cluster is running without this flag. :-/ >> >> >> >> >> >> Regards, >> >> >> Uwe >> >> >> >> >> >> > >> >> >> > --- >> >> >> > Diego Castro / The CloudFather >> >> >> > GetupCloud.com - Eliminamos a Gravidade >> >> >> > _______________________________________________ >> >> >> > ceph-users mailing list >> >> >> > ceph-users@xxxxxxxxxxxxxx >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@xxxxxxxxxxxxxx >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > >> > >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com