Jewel PG stuck inconsistent with 3 0-size objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Our cluster is running 10.2.9 (from Ubuntu; on 16.04 LTS), and we have a
pg that's stuck inconsistent; if I repair it, it logs "failed to pick
suitable auth object" (repair log attached, to try and stop my MUA
mangling it).

We then deep-scrubbed that pg, at which point
rados list-inconsistent-obj 67.2e --format=json-pretty produces a bit of
output (also attached), which includes that all 3 osds have a zero-sized
object e.g.

                    "osd": 1937,
                    "errors": [
                        "omap_digest_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0x45773901",
                    "data_digest": "0xffffffff"

All 3 osds have different omap_digest, but all have 0 size. Indeed,
looking on the OSD disks directly, each object is 0 size (i.e. they are
identical).

This looks similar to one of the failure modes in
http://tracker.ceph.com/issues/21388 where the is a suggestion (comment
19 from David Zafman) to do:

rados -p default.rgw.buckets.index setomapval
.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6 temporary-key anything
[deep-scrub]
rados -p default.rgw.buckets.index rmomapkey
.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6 temporary-key

Is this likely to be the correct approach here, to? And is there an
underlying bug in ceph that still needs fixing? :)

Thanks,

Matthew



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
2018-07-16 09:17:33.351755 7f058a047700  0 log_channel(cluster) log [INF] : 67.2e repair starts
2018-07-16 09:17:51.521378 7f0587842700 -1 log_channel(cluster) log [ERR] : 67.2e shard 1937: soid 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head omap_digest 0x45773901 != omap_digest 0x952ce474 from auth oi 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd ffffffff od 952ce474 alloc_hint [0 0])
2018-07-16 09:17:51.521463 7f0587842700 -1 log_channel(cluster) log [ERR] : 67.2e shard 1987: soid 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head omap_digest 0xec3afbe != omap_digest 0x45773901 from shard 1937, omap_digest 0xec3afbe != omap_digest 0x952ce474 from auth oi 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd ffffffff od 952ce474 alloc_hint [0 0])
2018-07-16 09:17:51.521653 7f0587842700 -1 log_channel(cluster) log [ERR] : 67.2e shard 2796: soid 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head omap_digest 0x5eec6452 != omap_digest 0x45773901 from shard 1937, omap_digest 0x5eec6452 != omap_digest 0x952ce474 from auth oi 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd ffffffff od 952ce474 alloc_hint [0 0])
2018-07-16 09:17:51.521702 7f0587842700 -1 log_channel(cluster) log [ERR] : 67.2e soid 67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head: failed to pick suitable auth object
2018-07-16 09:17:51.521988 7f0587842700 -1 log_channel(cluster) log [ERR] : 67.2e repair 4 errors, 0 fixed
{
    "epoch": 514919,
    "inconsistents": [
        {
            "object": {
                "name": ".dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 17812259
            },
            "errors": [
                "omap_digest_mismatch"
            ],
            "union_shard_errors": [
                "omap_digest_mismatch_oi"
            ],
            "selected_object_info": "67:7463f933:::.dir.861ae926-7ff0-48c5-86d6-a6ba8d0a7a14.7130858.6:head(444843'17812260 osd.1987.0:16910852 dirty|omap|data_digest|omap_digest s 0 uv 17812259 dd ffffffff od 952ce474 alloc_hint [0 0])",
            "shards": [
                {
                    "osd": 1937,
                    "errors": [
                        "omap_digest_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0x45773901",
                    "data_digest": "0xffffffff"
                },
                {
                    "osd": 1987,
                    "errors": [
                        "omap_digest_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0x0ec3afbe",
                    "data_digest": "0xffffffff"
                },
                {
                    "osd": 2796,
                    "errors": [
                        "omap_digest_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0x5eec6452",
                    "data_digest": "0xffffffff"
                }
            ]
        }
    ]
}
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux