Re: fixing unrepairable inconsistent PG

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 22 Jun 2018 11:05:51 +1000

That seems like an authentication issue?

Try running it like so...

$ ceph --debug_monc 20 --debug_auth 20 pg 18.2 query

On Thu, Jun 21, 2018 at 12:18 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Hi Brad,
>
> Yes, but it doesn't show much:
>
> ceph pg 18.2 query
> Error EPERM: problem getting command descriptions from pg.18.2
>
> Cheers
>
>
>
> ----- Original Message -----
>> From: "Brad Hubbard" <bhubbard@xxxxxxxxxx>
>> To: "andrei" <andrei@xxxxxxxxxx>
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Sent: Wednesday, 20 June, 2018 00:02:07
>> Subject: Re:  fixing unrepairable inconsistent PG
>
>> Can you post the output of a pg query?
>>
>> On Tue, Jun 19, 2018 at 11:44 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>>> A quick update on my issue. I have noticed that while I was trying to move
>>> the problem object on osds, the file attributes got lost on one of the osds,
>>> which is I guess why the error messages showed the no attribute bit.
>>>
>>> I then copied the attributes metadata to the problematic object and
>>> restarted the osds in question. Following a pg repair I got a different
>>> error:
>>>
>>> 2018-06-19 13:51:05.846033 osd.21 osd.21 192.168.168.203:6828/24339 2 :
>>> cluster [ERR] 18.2 shard 21: soid 18:45f87722:::.dir.default.80018061.2:head
>>> omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi
>>> 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910
>>> dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871
>>> alloc_hint [0 0 0])
>>> 2018-06-19 13:51:05.846042 osd.21 osd.21 192.168.168.203:6828/24339 3 :
>>> cluster [ERR] 18.2 shard 28: soid 18:45f87722:::.dir.default.80018061.2:head
>>> omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi
>>> 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910
>>> dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871
>>> alloc_hint [0 0 0])
>>> 2018-06-19 13:51:05.846046 osd.21 osd.21 192.168.168.203:6828/24339 4 :
>>> cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed
>>> to pick suitable auth object
>>> 2018-06-19 13:51:05.846118 osd.21 osd.21 192.168.168.203:6828/24339 5 :
>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_'
>>> attr
>>> 2018-06-19 13:51:05.846129 osd.21 osd.21 192.168.168.203:6828/24339 6 :
>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no
>>> 'snapset' attr
>>> 2018-06-19 13:51:09.810878 osd.21 osd.21 192.168.168.203:6828/24339 7 :
>>> cluster [ERR] 18.2 repair 4 errors, 0 fixed
>>>
>>> It mentions that there is an incorrect omap_digest . How do I go about
>>> fixing this?
>>>
>>> Cheers
>>>
>>> ________________________________
>>>
>>> From: "andrei" <andrei@xxxxxxxxxx>
>>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>>> Sent: Tuesday, 19 June, 2018 11:16:22
>>> Subject:  fixing unrepairable inconsistent PG
>>>
>>> Hello everyone
>>>
>>> I am having trouble repairing one inconsistent and stubborn PG. I get the
>>> following error in ceph.log:
>>>
>>>
>>>
>>> 2018-06-19 11:00:00.000225 mon.arh-ibstorage1-ib mon.0
>>> 192.168.168.201:6789/0 675 : cluster [ERR] overall HEALTH_ERR noout flag(s)
>>> set; 4 scrub errors; Possible data damage: 1 pg inconsistent; application
>>> not enabled on 4 pool(s)
>>> 2018-06-19 11:09:24.586392 mon.arh-ibstorage1-ib mon.0
>>> 192.168.168.201:6789/0 841 : cluster [ERR] Health check update: Possible
>>> data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)
>>> 2018-06-19 11:09:27.139504 osd.21 osd.21 192.168.168.203:6828/4003 2 :
>>> cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed
>>> to pick suitable object info
>>> 2018-06-19 11:09:27.139545 osd.21 osd.21 192.168.168.203:6828/4003 3 :
>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_'
>>> attr
>>> 2018-06-19 11:09:27.139550 osd.21 osd.21 192.168.168.203:6828/4003 4 :
>>> cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no
>>> 'snapset' attr
>>>
>>> 2018-06-19 11:09:35.484402 osd.21 osd.21 192.168.168.203:6828/4003 5 :
>>> cluster [ERR] 18.2 repair 4 errors, 0 fixed
>>> 2018-06-19 11:09:40.601657 mon.arh-ibstorage1-ib mon.0
>>> 192.168.168.201:6789/0 844 : cluster [ERR] Health check update: Possible
>>> data damage: 1 pg inconsistent (PG_DAMAGED)
>>>
>>>
>>> I have tried to follow a few instructions on the PG repair, including
>>> removal of the 'broken' object .dir.default.80018061.2
>>>  from primary osd following by the pg repair. After that didn't work, I've
>>> done the same for the secondary osd. Still the same issue.
>>>
>>> Looking at the actual object on the file system, the file size is 0 for both
>>> primary and secondary objects. The md5sum is the same too. The broken PG
>>> belongs to the radosgw bucket called .rgw.buckets.index
>>>
>>> What else can I try to get the thing fixed?
>>>
>>> Cheers
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> Cheers,
>> Brad

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com