Re: unable to repair PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Greg,

thanks for your help. It's always highly appreciated. :)

On Thu, Dec 11, 2014 at 6:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Thu, Dec 11, 2014 at 2:57 AM, Luis Periquito <periquito@xxxxxxxxx> wrote:
> Hi,
>
> I've stopped OSD.16, removed the PG from the local filesystem and started
> the OSD again. After ceph rebuilt the PG in the removed OSD I ran a
> deep-scrub and the PG is still inconsistent.

What led you to remove it from osd 16? Is that the one hosting the log
you snipped from? Is osd 16 the one hosting shard 6 of that PG, or was
it the primary?
OSD 16 is both the primary for this PG and the one that has the snipped log. The other 3 OSDs has any mention of this PG in their logs. Just some messages about slow requests and the backfill when I removed the object. Actually it came from OSD.6 - currently we don't have OSD.3.

this is the output of the pg dump for this PG
9.180    25614    0    0    0    23306482348    3001    3001    active+clean+inconsistent    2014-12-10 17:29:01.937929    40242'1108124    40242:23305321    [16,10,27,6]    16    [16,10,27,6]16    40242'1071363    2014-12-10 17:29:01.937881    40242'1071363    2014-12-10 17:29:01.937881
 
Anyway, the message means that shard 6 (which I think is the seventh
OSD in the list) of PG 9.180 is missing a bunch of xattrs on object
370cbf80/29145.4_xxx/head//9. I'm actually a little surprised it
didn't crash if it's missing the "_" attr....
-Greg

Any idea on how to fix it?
 

>
> I'm running out of ideas on trying to solve this. Does this mean that all
> copies of the object should also be inconsistent? Should I just try to
> figure which object/bucket this belongs to and delete it/copy it again to
> the ceph cluster?
>
> Also, do you know what the error message means? is it just some sort of
> metadata for this object that isn't correct, not the object itself?
>
> On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito <periquito@xxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> In the last few days this PG (pool is .rgw.buckets) has been in error
>> after running the scrub process.
>>
>> After getting the error, and trying to see what may be the issue (and
>> finding none), I've just issued a ceph repair followed by a ceph deep-scrub.
>> However it doesn't seem to have fixed the issue and it still remains.
>>
>> The relevant log from the OSD is as follows.
>>
>> 2014-12-10 09:38:09.348110 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
>> missing, 1 inconsistent objects
>> 2014-12-10 09:38:09.348116 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
>> errors
>> 2014-12-10 10:13:15.922065 7f8f618be700  0 log [INF] : 9.180 repair ok, 0
>> fixed
>> 2014-12-10 10:55:27.556358 7f8f618be700  0 log [ERR] : 9.180 shard 6: soid
>> 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr _user.rgw.acl,
>> missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing
>> attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr
>> _user.rgw.x-amz-meta-md5sum, missing attr _user.rgw.x-amz-meta-stat, missing
>> attr snapset
>> 2014-12-10 10:56:50.597952 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 0
>> missing, 1 inconsistent objects
>> 2014-12-10 10:56:50.597957 7f8f618be700  0 log [ERR] : 9.180 deep-scrub 1
>> errors
>>
>> I'm running version firefly 0.80.7.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux