Re: unable to repair PG

Luis Periquito <periquito@xxxxxxxxx> · Fri, 12 Dec 2014 09:11:54 +0000

Hi Greg,

thanks for your help. It's always highly appreciated. :)

On Thu, Dec 11, 2014 at 6:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Thu, Dec 11, 2014 at 2:57 AM, Luis Periquito <periquito@xxxxxxxxx> wrote:

> Hi,

>

> I've stopped OSD.16, removed the PG from the local filesystem and started

> the OSD again. After ceph rebuilt the PG in the removed OSD I ran a

> deep-scrub and the PG is still inconsistent.

What led you to remove it from osd 16? Is that the one hosting the log

you snipped from? Is osd 16 the one hosting shard 6 of that PG, or was

it the primary?
OSD 16 is both the primary for this PG and the one that has the snipped log. The other 3 OSDs has any mention of this PG in their logs. Just some messages about slow requests and the backfill when I removed the object. Actually it came from OSD.6 - currently we don't have OSD.3.

this is the output of the pg dump for this PG
9.180    25614    0    0    0    23306482348    3001    3001    active+clean+inconsistent    2014-12-10 17:29:01.937929    40242'1108124    40242:23305321    [16,10,27,6]    16    [16,10,27,6]16    40242'1071363    2014-12-10 17:29:01.937881    40242'1071363    2014-12-10 17:29:01.937881

Anyway, the message means that shard 6 (which I think is the seventh

OSD in the list) of PG 9.180 is missing a bunch of xattrs on object

370cbf80/29145.4_xxx/head//9. I'm actually a little surprised it

didn't crash if it's missing the "_" attr....

-Greg

Any idea on how to fix it?

>

> I'm running out of ideas on trying to solve this. Does this mean that all

> copies of the object should also be inconsistent? Should I just try to

> figure which object/bucket this belongs to and delete it/copy it again to

> the ceph cluster?

>

> Also, do you know what the error message means? is it just some sort of

> metadata for this object that isn't correct, not the object itself?

>

> On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito <periquito@xxxxxxxxx>

> wrote:

>>

>> Hi,

>>

>> In the last few days this PG (pool is .rgw.buckets) has been in error

>> after running the scrub process.

>>

>> After getting the error, and trying to see what may be the issue (and

>> finding none), I've just issued a ceph repair followed by a ceph deep-scrub.

>> However it doesn't seem to have fixed the issue and it still remains.

>>

>> The relevant log from the OSD is as follows.

>>

>> 2014-12-10 09:38:09.348110 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0

>> missing, 1 inconsistent objects

>> 2014-12-10 09:38:09.348116 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1

>> errors

>> 2014-12-10 10:13:15.922065 7f8f618be700  0 log [INF] : 9.180 repair ok, 0

>> fixed

>> 2014-12-10 10:55:27.556358 7f8f618be700  0 log [ERR] : 9.180 shard 6: soid

>> 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr _user.rgw.acl,

>> missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing

>> attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr

>> _user.rgw.x-amz-meta-md5sum, missing attr _user.rgw.x-amz-meta-stat, missing

>> attr snapset

>> 2014-12-10 10:56:50.597952 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0

>> missing, 1 inconsistent objects

>> 2014-12-10 10:56:50.597957 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1

>> errors

>>

>> I'm running version firefly 0.80.7.

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com