Ceph PG repair

reed.dier@xxxxxxxxxxx (Reed Dier) · Wed, 8 Mar 2017 10:13:47 -0600

This PG/object is still doing something rather odd.

Attempted to repair the object, which it supposedly attempted, but now I appear to have less visibility.

> $ ceph health detail
> HEALTH_ERR 3 pgs inconsistent; 4 scrub errors; mds0: Many clients (20) failing to respond to cache pressure; noout,sortbitwise,require_jewel_osds flag(s) set
> pg 10.2d8 is active+clean+inconsistent, acting [18,17,22]
> pg 10.7bd is active+clean+inconsistent, acting [8,23,17]
> pg 17.ec is active+clean+inconsistent, acting [23,2,21]
> 4 scrub errors
> noout,sortbitwise,require_jewel_osds flag(s) set

23 is the osd scheduled for replacement, generated another read error.

However, 17.ec does not show in the rados list inconsistent pg objects command

> $ rados list-inconsistent-pg objects
> ["10.2d8","10.7bd?]

And examining 10.2d8 as before, I?m presented with this:

> $ rados list-inconsistent-obj 10.2d8 --format=json-pretty
> {
>     "epoch": 21094,
>     "inconsistents": []
> }

Even though in the logs, the deep scrub and repair both show that the object was not repaired.

> $ zgrep 10.2d8 ceph-*
> ceph-osd.18.log.2.gz:2017-03-06 15:10:08.729827 7fc8dfeb8700  0 log_channel(cluster) log [INF] : 10.2d8 repair starts
> ceph-osd.18.log.2.gz:2017-03-06 15:13:49.793839 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 recorded data digest 0x7fa9879c != on disk 0xa6798e03 on {object.name}:head
> ceph-osd.18.log.2.gz:2017-03-06 15:13:49.793941 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : repair 10.2d8 {object.name}:head on disk size (15913) does not match object info size (10280) adjusted for ondisk to (10280)
> ceph-osd.18.log.2.gz:2017-03-06 15:46:13.286268 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 repair 2 errors, 0 fixed
> ceph-osd.18.log.4.gz:2017-03-04 18:16:23.693057 7fc8dd6b3700  0 log_channel(cluster) log [INF] : 10.2d8 deep-scrub starts
> ceph-osd.18.log.4.gz:2017-03-04 18:19:25.471322 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 recorded data digest 0x7fa9879c != on disk 0xa6798e03 on {object.name}:head
> ceph-osd.18.log.4.gz:2017-03-04 18:19:25.471403 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : deep-scrub 10.2d8 {object.name}:head on disk size (15913) does not match object info size (10280) adjusted for ondisk to (10280)
> ceph-osd.18.log.4.gz:2017-03-04 18:55:39.617841 7fc8dd6b3700 -1 log_channel(cluster) log [ERR] : 10.2d8 deep-scrub 2 errors

File size and md5 still match.

> ls -la /var/lib/ceph/osd/ceph-*/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> -rw-r--r-- 1 ceph ceph 15913 Mar  2 17:24 /var/lib/ceph/osd/ceph-17/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}

> -rw-r--r-- 1 ceph ceph 15913 Mar  2 17:24 /var/lib/ceph/osd/ceph-18/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> -rw-r--r-- 1 ceph ceph 15913 Mar  2 17:24 /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}

> md5sum /var/lib/ceph/osd/ceph-*/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> 55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-17/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> 55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-18/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> 55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}

So is the object actually inconsistent?
Is rados somehow behind on something, not showing the third inconsistent PG?

Appreciate any help.

Reed

> On Mar 2, 2017, at 9:21 AM, Reed Dier <reed.dier at focusvq.com> wrote:
> 
> Over the weekend, two inconsistent PG?s popped up in my cluster. This being after having scrubs disabled for close to 6 weeks after a very long rebalance after adding 33% more OSD?s, an OSD failing, increasing PG?s, etc.
> 
> It appears we came out the other end with 2 inconsistent PG?s and I?m trying to resolve them, and not seeming to have much luck.
> Ubuntu 16.04, Jewel 10.2.5, 3x replicated pool for reference.
> 
>> $ ceph health detail
>> HEALTH_ERR 2 pgs inconsistent; 3 scrub errors; noout,sortbitwise,require_jewel_osds flag(s) set
>> pg 10.7bd is active+clean+inconsistent, acting [8,23,17]
>> pg 10.2d8 is active+clean+inconsistent, acting [18,17,22]
>> 3 scrub errors
> 
>> $ rados list-inconsistent-pg objects
>> ["10.2d8","10.7bd?]
> 
> Pretty straight forward, 2 PG?s with inconsistent copies. Lets dig deeper.
> 
>> $ rados list-inconsistent-obj 10.2d8 --format=json-pretty
>> {
>>     "epoch": 21094,
>>     "inconsistents": [
>>         {
>>             "object": {
>>                 "name": ?object.name",
>>                 "nspace": ?namespace.name",
>>                 "locator": "",
>>                 "snap": "head"
>>             },
>>             "errors": [],
>>             "shards": [
>>                 {
>>                     "osd": 17,
>>                     "size": 15913,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0xa6798e03",
>>                     "errors": []
>>                 },
>>                 {
>>                     "osd": 18,
>>                     "size": 15913,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0xa6798e03",
>>                     "errors": []
>>                 },
>>                 {
>>                     "osd": 22,
>>                     "size": 15913,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0xa6798e03",
>>                     "errors": [
>>                         "data_digest_mismatch_oi"
>>                     ]
>>                 }
>>             ]
>>         }
>>     ]
>> }
> 
>> $ rados list-inconsistent-obj 10.7bd --format=json-pretty
>> {
>>     "epoch": 21070,
>>     "inconsistents": [
>>         {
>>             "object": {
>>                 "name": ?object2.name",
>>                 "nspace": ?namespace.name",
>>                 "locator": "",
>>                 "snap": "head"
>>             },
>>             "errors": [
>>                 "read_error"
>>             ],
>>             "shards": [
>>                 {
>>                     "osd": 8,
>>                     "size": 27691,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0x9ce36903",
>>                     "errors": []
>>                 },
>>                 {
>>                     "osd": 17,
>>                     "size": 27691,
>>                     "omap_digest": "0xffffffff",
>>                     "data_digest": "0x9ce36903",
>>                     "errors": []
>>                 },
>>                 {
>>                     "osd": 23,
>>                     "size": 27691,
>>                     "errors": [
>>                         "read_error"
>>                     ]
>>                 }
>>             ]
>>         }
>>     ]
>> }
> 
> 
> So we have one PG (10.7bd) with a read error on osd.23, which is known and scheduled for replacement.
> We also have a data digest mismatch on PG 10.2d8 on osd.22, which I have been attempting to repair with no real tangible results.
> 
>> $ ceph pg repair 10.2d8
>> instructing pg 10.2d8 on osd.18 to repair
> 
> I?ve run the ceph pg repair command multiple times, and each time, it instructs osd.18 to repair to the PG.
> Is this to assume that osd.18 is the acting member of the copies, and its being told to backfill the known-good copy of the PG over the agreed upon wrong version on osd.22.
> 
>> $ zgrep 'ERR' /var/log/ceph/*
>> /var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 20:45:21.561164 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 recorded data digest 0x7fa9879c != on disk 0xa6798e03 on 10:1b42251f:{object.name}:head
>> /var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 20:45:21.561225 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : deep-scrub 10.2d8 10:1b42251f:{object.name}:head on disk size (15913) does not match object info size (10280) adjusted for ondisk to (10280)
>> /var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 21:05:59.935815 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 deep-scrub 2 errors
> 
> 
>> $ ceph pg 10.2d8 query
>> {
>>     "state": "active+clean+inconsistent",
>>     "snap_trimq": "[]",
>>     "epoch": 21746,
>>     "up": [
>>         18,
>>         17,
>>         22
>>     ],
>>     "acting": [
>>         18,
>>         17,
>>         22
>>     ],
>>     "actingbackfill": [
>>         "17",
>>         "18",
>>         "22"
>>     ],
> 
> However, no recovery io ever occurs, and the PG never goes active+clean. Not seeing anything exciting in the logs of the OSD?s nor the mon?s.
> 
> I?ve found a few articles and mailing list entries that mention downing the OSD, flushing the journal, moving object off the disk, starting the OSD, and running the repair command again.
> 
> However, after finding the object on disk, and eyeballing the size and the md5sum, they all appear to be identical.
>> $ ls -la /var/lib/ceph/osd/ceph-*/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
>> -rw-r--r-- 1 ceph ceph 15913 Jan 27 02:31 /var/lib/ceph/osd/ceph-17/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
>> -rw-r--r-- 1 ceph ceph 15913 Jan 27 02:31 /var/lib/ceph/osd/ceph-18/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
>> -rw-r--r-- 1 ceph ceph 15913 Jan 27 02:31 /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> 
>> $ md5sum /var/lib/ceph/osd/ceph-*/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
>> 55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-17/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
>> 55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-18/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> 
>> 55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
> 
> Should I schedule another scrub? Should I do the whole down the OSD, flush journal, move object song and dance?
> 
> Hoping the user list will provide some insight into the proper steps to move forward with. And assuming the other inconsistent PG will fix itself once the 
> 
> Thanks,
> 
> Reed

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170308/f8704254/attachment.htm>