Ceph PG repair

Reed Dier <reed.dier@xxxxxxxxxxx> · Thu, 2 Mar 2017 09:21:38 -0600

Over the weekend, two inconsistent PG’s popped up in my cluster. This being after having scrubs disabled for close to 6 weeks after a very long rebalance after adding 33% more OSD’s, an OSD failing, increasing PG’s, etc.
It appears we came out the other end with 2 inconsistent PG’s and I’m trying to resolve them, and not seeming to have much luck.
Ubuntu 16.04, Jewel 10.2.5, 3x replicated pool for reference.

$ ceph health detail
HEALTH_ERR 2 pgs inconsistent; 3 scrub errors; noout,sortbitwise,require_jewel_osds flag(s) set
pg 10.7bd is active+clean+inconsistent, acting [8,23,17]
pg 10.2d8 is active+clean+inconsistent, acting [18,17,22]
3 scrub errors

$ rados list-inconsistent-pg objects
["10.2d8","10.7bd”]

Pretty straight forward, 2 PG’s with inconsistent copies. Lets dig deeper.

$ rados list-inconsistent-obj 10.2d8 --format=json-pretty
{
    "epoch": 21094,
    "inconsistents": [
        {
            "object": {
                "name": “object.name",
                "nspace": “namespace.name",
                "locator": "",
                "snap": "head"
            },
            "errors": [],
            "shards": [
                {
                    "osd": 17,
                    "size": 15913,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xa6798e03",
                    "errors": []
                },
                {
                    "osd": 18,
                    "size": 15913,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xa6798e03",
                    "errors": []
                },
                {
                    "osd": 22,
                    "size": 15913,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xa6798e03",
                    "errors": [
                        "data_digest_mismatch_oi"
                    ]
                }
            ]
        }
    ]
}

$ rados list-inconsistent-obj 10.7bd --format=json-pretty
{
    "epoch": 21070,
    "inconsistents": [
        {
            "object": {
                "name": “object2.name",
                "nspace": “namespace.name",
                "locator": "",
                "snap": "head"
            },
            "errors": [
                "read_error"
            ],
            "shards": [
                {
                    "osd": 8,
                    "size": 27691,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x9ce36903",
                    "errors": []
                },
                {
                    "osd": 17,
                    "size": 27691,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x9ce36903",
                    "errors": []
                },
                {
                    "osd": 23,
                    "size": 27691,
                    "errors": [
                        "read_error"
                    ]
                }
            ]
        }
    ]
}

So we have one PG (10.7bd) with a read error on osd.23, which is known and scheduled for replacement.
We also have a data digest mismatch on PG 10.2d8 on osd.22, which I have been attempting to repair with no real tangible results.

$ ceph pg repair 10.2d8
instructing pg 10.2d8 on osd.18 to repair

I’ve run the ceph pg repair command multiple times, and each time, it instructs osd.18 to repair to the PG.
Is this to assume that osd.18 is the acting member of the copies, and its being told to backfill the known-good copy of the PG over the agreed upon wrong version on osd.22.

$ zgrep 'ERR' /var/log/ceph/*
/var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 20:45:21.561164 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 recorded data digest 0x7fa9879c != on disk 0xa6798e03 on 10:1b42251f:{object.name}:head
/var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 20:45:21.561225 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : deep-scrub 10.2d8 10:1b42251f:{object.name}:head on disk size (15913) does not match object info size (10280) adjusted for ondisk to (10280)
/var/log/ceph/ceph-osd.18.log.7.gz:2017-02-23 21:05:59.935815 7fc8dfeb8700 -1 log_channel(cluster) log [ERR] : 10.2d8 deep-scrub 2 errors

$ ceph pg 10.2d8 query
{
    "state": "active+clean+inconsistent",
    "snap_trimq": "[]",
    "epoch": 21746,
    "up": [
        18,
        17,
        22
    ],
    "acting": [
        18,
        17,
        22
    ],
    "actingbackfill": [
        "17",
        "18",
        "22"
    ],

However, no recovery io ever occurs, and the PG never goes active+clean. Not seeing anything exciting in the logs of the OSD’s nor the mon’s.

I’ve found a few articles and mailing list entries that mention downing the OSD, flushing the journal, moving object off the disk, starting the OSD, and running the repair command again.

However, after finding the object on disk, and eyeballing the size and the md5sum, they all appear to be identical.
$ ls -la /var/lib/ceph/osd/ceph-*/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
-rw-r--r-- 1 ceph ceph 15913 Jan 27 02:31 /var/lib/ceph/osd/ceph-17/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
-rw-r--r-- 1 ceph ceph 15913 Jan 27 02:31 /var/lib/ceph/osd/ceph-18/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
-rw-r--r-- 1 ceph ceph 15913 Jan 27 02:31 /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}

$ md5sum /var/lib/ceph/osd/ceph-*/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-17/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-18/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}
55a76349b758d68945e5028784c59f24  /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name}

Should I schedule another scrub? Should I do the whole down the OSD, flush journal, move object song and dance?

Hoping the user list will provide some insight into the proper steps to move forward with. And assuming the other inconsistent PG will fix itself once the 

Thanks,

Reed
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com