On Thu, Mar 07, 2019 at 01:37:55PM -0300, Herbert Alexander Faleiros wrote: > Hi, > > # ceph health detail > HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent > OSD_SCRUB_ERRORS 3 scrub errors > PG_DAMAGED Possible data damage: 1 pg inconsistent > pg 2.2bb is active+clean+inconsistent, acting [36,12,80] > > # ceph pg repair 2.2bb > instructing pg 2.2bb on osd.36 to repair > > But: > > 2019-03-07 13:23:38.636881 [ERR] Health check update: Possible data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) > 2019-03-07 13:20:38.373431 [ERR] 2.2bb deep-scrub 3 errors > 2019-03-07 13:20:38.373426 [ERR] 2.2bb deep-scrub 0 missing, 1 inconsistent objects > 2019-03-07 13:20:43.486860 [ERR] Health check update: 3 scrub errors (OSD_SCRUB_ERRORS) > 2019-03-07 13:19:17.741350 [ERR] deep-scrub 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : is an unexpected clone > 2019-03-07 13:19:17.523042 [ERR] 2.2bb shard 36 soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : data_digest 0xffffffff != data_digest 0xfc6b9538 from shard 12, size 0 != size 4194304 from auth oi 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986(482757'14986708 client.112595650.0:344888465 dirty|omap_digest s 4194304 uv 14974021 od ffffffff alloc_hint [0 0 0]), size 0 != size 4194304 from shard 12 > 2019-03-07 13:19:17.523038 [ERR] 2.2bb shard 36 soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : candidate size 0 info size 4194304 mismatch > 2019-03-07 13:16:48.542673 [ERR] 2.2bb repair 2 errors, 1 fixed > 2019-03-07 13:16:48.542656 [ERR] 2.2bb repair 1 missing, 0 inconsistent objects > 2019-03-07 13:16:53.774956 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED) > 2019-03-07 13:16:53.774916 [ERR] Health check update: 2 scrub errors (OSD_SCRUB_ERRORS) > 2019-03-07 13:15:16.986872 [ERR] repair 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : is an unexpected clone > 2019-03-07 13:15:16.986817 [ERR] 2.2bb shard 36 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : missing > 2019-03-07 13:12:18.517442 [ERR] Health check update: Possible data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) > > Also tried deep-scrub and scrub, same results. > > Also set noscrub,nodeep-scrub, kicked currently active scrubs one at > a time using 'ceph osd down <id>'. After the last scrub was kicked, > forced scrub ran immediately then 'ceph pg repair', no luck. > > Finally tryed the manual aproach: > > - stop osd.36 > - flush-journal > - rm rbd\udata.dfd5e2235befd0.000000000001c299__4f986_CBDE52BB__2 > - start osd.36 > - ceph pg repair 2.2bb > > Also no luck... > > rbd\udata.dfd5e2235befd0.000000000001c299__4f986_CBDE52BB__2 at osd.36 > is empty (0 size). At osd.80 4.0M, osd.2 is bluestore (can't find it). > > Ceph is 12.2.10, I'm currently migrating all my OSDs to bluestore. > > Is there anything else I can do? Should I do something like this? (below, after stop osd.36) # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-36/ --journal-path /dev/sdc1 rbd_data.dfd5e2235befd0.000000000001c299 remove-clone-metadata 326022 I'm no sure about rbd_data.$RBD and $CLONEID (took from rados list-inconsistent-obj, also below). > # rados list-inconsistent-obj 2.2bb | jq > { > "epoch": 484655, > "inconsistents": [ > { > "object": { > "name": "rbd_data.dfd5e2235befd0.000000000001c299", > "nspace": "", > "locator": "", > "snap": 326022, > "version": 14974021 > }, > "errors": [ > "data_digest_mismatch", > "size_mismatch" > ], > "union_shard_errors": [ > "size_mismatch_info", > "obj_size_info_mismatch" > ], > "selected_object_info": { > "oid": { > "oid": "rbd_data.dfd5e2235befd0.000000000001c299", > "key": "", > "snapid": 326022, > "hash": 3420345019, > "max": 0, > "pool": 2, > "namespace": "" > }, > "version": "482757'14986708", > "prior_version": "482697'14980304", > "last_reqid": "client.112595650.0:344888465", > "user_version": 14974021, > "size": 4194304, > "mtime": "2019-03-02 22:30:23.812849", > "local_mtime": "2019-03-02 22:30:23.813281", > "lost": 0, > "flags": [ > "dirty", > "omap_digest" > ], > "legacy_snaps": [], > "truncate_seq": 0, > "truncate_size": 0, > "data_digest": "0xffffffff", > "omap_digest": "0xffffffff", > "expected_object_size": 0, > "expected_write_size": 0, > "alloc_hint_flags": 0, > "manifest": { > "type": 0, > "redirect_target": { > "oid": "", > "key": "", > "snapid": 0, > "hash": 0, > "max": 0, > "pool": -9223372036854776000, > "namespace": "" > } > }, > "watchers": {} > }, > "shards": [ > { > "osd": 12, > "primary": false, > "errors": [], > "size": 4194304, > "omap_digest": "0xffffffff", > "data_digest": "0xfc6b9538" > }, > { > "osd": 36, > "primary": true, > "errors": [ > "size_mismatch_info", > "obj_size_info_mismatch" > ], > "size": 0, > "omap_digest": "0xffffffff", > "data_digest": "0xffffffff", > "object_info": { > "oid": { > "oid": "rbd_data.dfd5e2235befd0.000000000001c299", > "key": "", > "snapid": 326022, > "hash": 3420345019, > "max": 0, > "pool": 2, > "namespace": "" > }, > "version": "482757'14986708", > "prior_version": "482697'14980304", > "last_reqid": "client.112595650.0:344888465", > "user_version": 14974021, > "size": 4194304, > "mtime": "2019-03-02 22:30:23.812849", > "local_mtime": "2019-03-02 22:30:23.813281", > "lost": 0, > "flags": [ > "dirty", > "omap_digest" > ], > "legacy_snaps": [], > "truncate_seq": 0, > "truncate_size": 0, > "data_digest": "0xffffffff", > "omap_digest": "0xffffffff", > "expected_object_size": 0, > "expected_write_size": 0, > "alloc_hint_flags": 0, > "manifest": { > "type": 0, > "redirect_target": { > "oid": "", > "key": "", > "snapid": 0, > "hash": 0, > "max": 0, > "pool": -9223372036854776000, > "namespace": "" > } > }, > "watchers": {} > } > }, > { > "osd": 80, > "primary": false, > "errors": [], > "size": 4194304, > "omap_digest": "0xffffffff", > "data_digest": "0xfc6b9538" > } > ] > } > ] > }