What version of Ceph are you running? Is this a replicated or erasure-coded pool? On Fri, Dec 12, 2014 at 1:11 AM, Luis Periquito <periquito@xxxxxxxxx> wrote: > Hi Greg, > > thanks for your help. It's always highly appreciated. :) > > On Thu, Dec 11, 2014 at 6:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> On Thu, Dec 11, 2014 at 2:57 AM, Luis Periquito <periquito@xxxxxxxxx> >> wrote: >> > Hi, >> > >> > I've stopped OSD.16, removed the PG from the local filesystem and >> > started >> > the OSD again. After ceph rebuilt the PG in the removed OSD I ran a >> > deep-scrub and the PG is still inconsistent. >> >> What led you to remove it from osd 16? Is that the one hosting the log >> you snipped from? Is osd 16 the one hosting shard 6 of that PG, or was >> it the primary? > > OSD 16 is both the primary for this PG and the one that has the snipped log. > The other 3 OSDs has any mention of this PG in their logs. Just some > messages about slow requests and the backfill when I removed the object. > Actually it came from OSD.6 - currently we don't have OSD.3. > > this is the output of the pg dump for this PG > 9.180 25614 0 0 0 23306482348 3001 3001 > active+clean+inconsistent 2014-12-10 17:29:01.937929 40242'1108124 > 40242:23305321 [16,10,27,6] 16 [16,10,27,6]16 40242'1071363 > 2014-12-10 17:29:01.937881 40242'1071363 2014-12-10 17:29:01.937881 > >> >> Anyway, the message means that shard 6 (which I think is the seventh >> OSD in the list) of PG 9.180 is missing a bunch of xattrs on object >> 370cbf80/29145.4_xxx/head//9. I'm actually a little surprised it >> didn't crash if it's missing the "_" attr.... >> -Greg > > > Any idea on how to fix it? > >> >> >> > >> > I'm running out of ideas on trying to solve this. Does this mean that >> > all >> > copies of the object should also be inconsistent? Should I just try to >> > figure which object/bucket this belongs to and delete it/copy it again >> > to >> > the ceph cluster? >> > >> > Also, do you know what the error message means? is it just some sort of >> > metadata for this object that isn't correct, not the object itself? >> > >> > On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito <periquito@xxxxxxxxx> >> > wrote: >> >> >> >> Hi, >> >> >> >> In the last few days this PG (pool is .rgw.buckets) has been in error >> >> after running the scrub process. >> >> >> >> After getting the error, and trying to see what may be the issue (and >> >> finding none), I've just issued a ceph repair followed by a ceph >> >> deep-scrub. >> >> However it doesn't seem to have fixed the issue and it still remains. >> >> >> >> The relevant log from the OSD is as follows. >> >> >> >> 2014-12-10 09:38:09.348110 7f8f618be700 0 log [ERR] : 9.180 deep-scrub >> >> 0 >> >> missing, 1 inconsistent objects >> >> 2014-12-10 09:38:09.348116 7f8f618be700 0 log [ERR] : 9.180 deep-scrub >> >> 1 >> >> errors >> >> 2014-12-10 10:13:15.922065 7f8f618be700 0 log [INF] : 9.180 repair ok, >> >> 0 >> >> fixed >> >> 2014-12-10 10:55:27.556358 7f8f618be700 0 log [ERR] : 9.180 shard 6: >> >> soid >> >> 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr >> >> _user.rgw.acl, >> >> missing attr _user.rgw.content_type, missing attr _user.rgw.etag, >> >> missing >> >> attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr >> >> _user.rgw.x-amz-meta-md5sum, missing attr _user.rgw.x-amz-meta-stat, >> >> missing >> >> attr snapset >> >> 2014-12-10 10:56:50.597952 7f8f618be700 0 log [ERR] : 9.180 deep-scrub >> >> 0 >> >> missing, 1 inconsistent objects >> >> 2014-12-10 10:56:50.597957 7f8f618be700 0 log [ERR] : 9.180 deep-scrub >> >> 1 >> >> errors >> >> >> >> I'm running version firefly 0.80.7. >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com