Re: how to debug pg inconsistent state - no ioerrors seen

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 9 Aug 2016 07:13:06 -0700



On Tue, Aug 9, 2016 at 2:00 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
> Hi,
>
> I did a diff on the directories of all three the osds, no difference .. So I
> don't know what's wrong.

omap (as implied by the omap_digest complaint) is stored in the OSD
leveldb, not in the data directories, so you wouldn't expect to see
any differences from a raw diff. I think you can extract the omaps as
well by using the ceph-objectstore-tool or whatever it's called (I
haven't done it myself) and compare those. Should see if you get more
useful information out of the pg query first, though!
-Greg

>
> Only thing I see different is a scrub file in the TEMP folder (it is already
> another pg than last mail):
>
> -rw-r--r--    1 ceph ceph     0 Aug  9 09:51
> scrub\u6.107__head_00000107__fffffffffffffff8
>
> But it is empty..
>
> Thanks!
>
>
>
> On 09/08/16 04:33, Goncalo Borges wrote:
>>
>> Hi Kenneth...
>>
>> The previous default behavior of 'ceph pg repair' was to copy the pg
>> objects from the primary osd to others. Not sure if it is till the case in
>> Jewel. For this reason, once we get these kind of errors in a data pool, the
>> best practice is to compare the md5 checksums of the damaged object in all
>> osds involved in the inconsistent pg. Since we have a 3 replica cluster, we
>> should find a 2 good object quorum. If by chance the primary osd has the
>> wrong object, it should delete it before running  the repair.
>>
>> On a metadata pool, I am not sure exactly how to cross check since all
>> objects are size 0 and therefore, md5sum is meaningless. Maybe, one way
>> forward could be to check the contents of the pg directories (ex:
>> /var/lib/ceph/osd/ceph-0/current/5.161_head/) in all osds involved for the
>> pg and see if we spot something wrong?
>>
>> Cheers
>>
>> G.
>>
>>
>> On 08/08/2016 09:40 PM, Kenneth Waegeman wrote:
>>>
>>> Hi all,
>>>
>>> Since last week, some pg's are going in the inconsistent state after a
>>> scrub error. Last week we had 4 pgs in that state, They were on different
>>> OSDS, but all of the metadata pool.
>>> I did a pg repair on them, and all were healthy again. But now again one
>>> pg is inconsistent.
>>>
>>> with health detail I see:
>>>
>>> pg 6.2f4 is active+clean+inconsistent, acting [3,5,1]
>>> 1 scrub errors
>>>
>>> And in the log of the primary:
>>>
>>> 2016-08-06 06:24:44.723224 7fc5493f3700 -1 log_channel(cluster) log [ERR]
>>> : 6.2f4 shard 5: soid 6:2f55791f:::606.00000000:head omap_digest 0x3a105358
>>> != best guess omap_digest 0xc85c4361 from auth shard 1
>>> 2016-08-06 06:24:53.931029 7fc54bbf8700 -1 log_channel(cluster) log [ERR]
>>> : 6.2f4 deep-scrub 0 missing, 1 inconsistent objects
>>> 2016-08-06 06:24:53.931055 7fc54bbf8700 -1 log_channel(cluster) log [ERR]
>>> : 6.2f4 deep-scrub 1 errors
>>>
>>> I looked in dmesg but I couldn't see any IO errors on any of the OSDs in
>>> the acting set.  Last week it was another set. It is of course possible more
>>> than 1 OSD is failing, but how can we check this, since there is nothing
>>> more in the logs?
>>>
>>> Thanks !!
>>>
>>> K
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com