Can either of you reproduce with logs? That would make it a lot easier to track down if it's a bug. I'd want debug osd = 20 debug ms = 1 debug filestore = 20 On all of the osds for a particular pg from when it is clean until it develops an unfound object. -Sam On Wed, Jun 1, 2016 at 5:36 AM, Diego Castro <diego.castro@xxxxxxxxxxxxxx> wrote: > Hello Uwe, i also have sortbitwise flag enable and i have the exactly > behavior of yours. > Perhaps this is also the root of my issues, does anybody knows if is safe to > disable it? > > > --- > Diego Castro / The CloudFather > GetupCloud.com - Eliminamos a Gravidade > > 2016-06-01 7:17 GMT-03:00 Uwe Mesecke <uwe@xxxxxxxxxxx>: >> >> >> > Am 01.06.2016 um 10:25 schrieb Diego Castro >> > <diego.castro@xxxxxxxxxxxxxx>: >> > >> > Hello, i have a cluster running Jewel 10.2.0, 25 OSD's + 4 Mon. >> > Today my cluster suddenly went unhealth with lots of stuck pg's due >> > unfound objects, no disks failures nor node crashes, it just went bad. >> > >> > I managed to put the cluster on health state again by marking lost >> > objects to delete "ceph pg <id> mark_unfound_lost delete". >> > Regarding the fact that i have no idea why the cluster gone bad, i >> > realized restarting the osd' daemons to unlock stuck clients put the cluster >> > on unhealth and pg gone stuck again due unfound objects. >> > >> > Does anyone have this issue? >> >> Hi, >> >> I also ran into that problem after upgrading to jewel. In my case I was >> able to somewhat correlate this behavior with setting the sortbitwise flag >> after the upgrade. When the flag is set, after some time these unfound >> objects are popping up. Restarting osds just makes it worse and/or makes >> these problems appear faster. When looking at the missing objects I can see >> that sometimes even region or zone configuration objects for radosgw are >> missing which I know are there because the radosgw was using these just >> before. >> >> After unsetting the sortbitwise flag, the PGs go back to normal, all >> previously unfound objects are found and the cluster becomes healthy again. >> >> Of course I’m not sure whether this is the real root of the problem or >> just a coincidence but I can reproduce this behavior every time. >> >> So for now the cluster is running without this flag. :-/ >> >> Regards, >> Uwe >> >> > >> > --- >> > Diego Castro / The CloudFather >> > GetupCloud.com - Eliminamos a Gravidade >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com