Re: Weird scrub problem

Andrey Korolyov <andrey@xxxxxxx> · Sat, 27 Dec 2014 17:09:30 +0400

On Tue, Dec 23, 2014 at 4:17 AM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
> Oh, that's a bit less interesting.  The bug might be still around though.
> -Sam
>
> On Mon, Dec 22, 2014 at 2:50 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>> On Tue, Dec 23, 2014 at 1:12 AM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
>>> You'll have to reproduce with logs on all three nodes.  I suggest you
>>> open a high priority bug and attach the logs.
>>>
>>> debug osd = 20
>>> debug filestore = 20
>>> debug ms = 1
>>>
>>> I'll be out for the holidays, but I should be able to look at it when
>>> I get back.
>>> -Sam
>>>
>>
>>
>> Thanks Sam,
>>
>> although I am not sure if it makes not only a historical interest (the
>> mentioned cluster running cuttlefish), I`ll try to collect logs for
>> scrub.

Same stuff:
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg15447.html
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg14918.html

Looks like issue is still with us, though it requires meta or file
structure corruption to show itself. I`ll check if it can be
reproduced via rsync -X sec pg subdir -> pri pg subdir or vice-versa.
Mine case shows slightly different pathnames for same objects with
same checksums, may be a root reason then. As every case mentioned,
including mine, happened in oh-shit-hardware-is-broken case, I suggest
that the incurable corruption happens during primary backfill from
active replica at the recovery time.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com