On Tue, Apr 4, 2017 at 7:09 AM, Ben Morrice <ben.morrice@xxxxxxx> wrote: > Hi all, > > We have a weird issue with a few inconsistent PGs. We are running ceph 11.2 > on RHEL7. > > As an example inconsistent PG we have: > > # rados -p volumes list-inconsistent-obj 4.19 > {"epoch":83986,"inconsistents":[{"object":{"name":"rbd_header.08f7fa43a49c7f","nspace":"","locator":"","snap":"head","version":28785242},"errors":[],"union_shard_errors":["omap_digest_mismatch_oi"],"selected_object_info":"4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 > client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd > ffffffff od ffffffff alloc_hint [0 0 > 0])","shards":[{"osd":10,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":20,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":29,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"}]}]} > > If I try to repair this PG, I get the following in the OSD logs: > > 2017-04-04 14:31:37.825833 7f2d7f802700 -1 log_channel(cluster) log [ERR] : > 4.19 shard 10: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest > 0x62b5dcb6 != omap_digest 0xffffffff from auth oi > 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 > client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd > ffffffff od ffffffff alloc_hint [0 0 0]) > 2017-04-04 14:31:37.825863 7f2d7f802700 -1 log_channel(cluster) log [ERR] : > 4.19 shard 20: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest > 0x62b5dcb6 != omap_digest 0xffffffff from auth oi > 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 > client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd > ffffffff od ffffffff alloc_hint [0 0 0]) > 2017-04-04 14:31:37.825870 7f2d7f802700 -1 log_channel(cluster) log [ERR] : > 4.19 shard 29: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest > 0x62b5dcb6 != omap_digest 0xffffffff from auth oi > 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 > client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd > ffffffff od ffffffff alloc_hint [0 0 0]) > 2017-04-04 14:31:37.825877 7f2d7f802700 -1 log_channel(cluster) log [ERR] : > 4.19 soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head: failed to pick > suitable auth object > 2017-04-04 14:32:37.926980 7f2d7cffd700 -1 log_channel(cluster) log [ERR] : > 4.19 deep-scrub 3 errors > > If I list the omapvalues, they are null > > # rados -p volumes listomapvals rbd_header.08f7fa43a49c7f |wc -l > 0 > > > If I list the extended attributes on the filesystem of each OSD that hosts > this file, they are indeed empty (all 3 OSDs are the same, but just listing > one for brevity) > > getfattr > /var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\\uheader.08f7fa43a49c7f__head_6C8FC219__4 > getfattr: Removing leading '/' from absolute path names > # file: > var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\134uheader.08f7fa43a49c7f__head_6C8FC219__4 > user.ceph._ > user.ceph._@1 > user.ceph._lock.rbd_lock > user.ceph.snapset > user.cephos.spill_out > > > Is there anything I can do to recover from this situation? This is probably late, but for future reference, you can use the ceph-objectstore tool running against local OSDs to examine their specific state (as opposd to the rados listomapvals command, which just looks at the primary). If you have a valid replica, you generally just use that tool to delete the primary's copy of the object and copy it over from the replicas, or run a repair which does it for you. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com