Scrubs discovered the following inconsistency:
2018-08-23 17:21:07.933458 osd.62 osd.62 10.122.0.140:6805/77767 6 : cluster [ERR] 9.3cd shard 113: soid 9:b3cd8d89:::.dir.default.153398310.112:head omap_digest 0xea4ba012 != omap_digest 0xc5acebfd from shard 62, omap_digest 0xea4ba012 != omap_digest 0xc5acebfd from auth oi 9:b3cd8d89:::.dir.default.153398310.112:head(138609'2009129 osd.250.0:64658209 dirty|omap|data_digest|omap_digest s 0 uv 1995230 dd ffffffff od c5acebfd alloc_hint [0 0 0])
The omap_digest_mismatch appears on a non-primary OSD in a pool with 4 replicas. In this situation I decided to issue "pg repair" as I expected ceph will repair the broken object. The command was successful but repair on 9.3cd didn't start.
Then I have tried the procedure described here (setting a temporary key on the object to force recalculation of omap_digest):
But deep-scrub on 9.3cd didn't start. The OSD marked the 9.3cd for scrubbing, but that's all what happened:
2018-08-27 14:36:22.703848 7faa7e860700 20 osd.62 713813 OSD::ms_dispatch: scrub([9.3cd] deep) v2
2018-08-27 14:36:22.703869 7faa7e860700 20 osd.62 713813 _dispatch 0x55725b76d180 scrub([9.3cd] deep) v2
2018-08-27 14:36:22.703871 7faa7e860700 10 osd.62 713813 handle_scrub scrub([9.3cd] deep) v2
2018-08-27 14:36:22.703878 7faa7e860700 10 osd.62 713813 marking pg[9.3cd( v 713813'2359292 (713107'2357731,713813'2359292] local-lis/les=711049/711050 n=41419 ec=178/178 lis/c 711049/711049 les/c/f 711050/711149/222921 711049/711049/710352) [62,53,163,113] r=0 lpr=711049 crt=713813'2359292 lcod 713813'2359291 mlcod 713813'2359291 active+clean+inconsistent MUST_DEEP_SCRUB MUST_SCRUB] for scrub
2018-08-27 14:36:22.703869 7faa7e860700 20 osd.62 713813 _dispatch 0x55725b76d180 scrub([9.3cd] deep) v2
2018-08-27 14:36:22.703871 7faa7e860700 10 osd.62 713813 handle_scrub scrub([9.3cd] deep) v2
2018-08-27 14:36:22.703878 7faa7e860700 10 osd.62 713813 marking pg[9.3cd( v 713813'2359292 (713107'2357731,713813'2359292] local-lis/les=711049/711050 n=41419 ec=178/178 lis/c 711049/711049 les/c/f 711050/711149/222921 711049/711049/710352) [62,53,163,113] r=0 lpr=711049 crt=713813'2359292 lcod 713813'2359291 mlcod 713813'2359291 active+clean+inconsistent MUST_DEEP_SCRUB MUST_SCRUB] for scrub
Does anyone know how to recover from inconsistency in such case?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com