Hi all,
We have a weird issue with a few inconsistent PGs. We are running ceph
11.2 on RHEL7.
As an example inconsistent PG we have:
# rados -p volumes list-inconsistent-obj 4.19
{"epoch":83986,"inconsistents":[{"object":{"name":"rbd_header.08f7fa43a49c7f","nspace":"","locator":"","snap":"head","version":28785242},"errors":[],"union_shard_errors":["omap_digest_mismatch_oi"],"selected_object_info":"4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242
dd ffffffff od ffffffff alloc_hint [0 0
0])","shards":[{"osd":10,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":20,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":29,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"}]}]}
If I try to repair this PG, I get the following in the OSD logs:
2017-04-04 14:31:37.825833 7f2d7f802700 -1 log_channel(cluster) log
[ERR] : 4.19 shard 10: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head
omap_digest 0x62b5dcb6 != omap_digest 0xffffffff from auth oi
4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242
dd ffffffff od ffffffff alloc_hint [0 0 0])
2017-04-04 14:31:37.825863 7f2d7f802700 -1 log_channel(cluster) log
[ERR] : 4.19 shard 20: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head
omap_digest 0x62b5dcb6 != omap_digest 0xffffffff from auth oi
4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242
dd ffffffff od ffffffff alloc_hint [0 0 0])
2017-04-04 14:31:37.825870 7f2d7f802700 -1 log_channel(cluster) log
[ERR] : 4.19 shard 29: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head
omap_digest 0x62b5dcb6 != omap_digest 0xffffffff from auth oi
4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242
dd ffffffff od ffffffff alloc_hint [0 0 0])
2017-04-04 14:31:37.825877 7f2d7f802700 -1 log_channel(cluster) log
[ERR] : 4.19 soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head: failed to
pick suitable auth object
2017-04-04 14:32:37.926980 7f2d7cffd700 -1 log_channel(cluster) log
[ERR] : 4.19 deep-scrub 3 errors
If I list the omapvalues, they are null
# rados -p volumes listomapvals rbd_header.08f7fa43a49c7f |wc -l
0
If I list the extended attributes on the filesystem of each OSD that
hosts this file, they are indeed empty (all 3 OSDs are the same, but
just listing one for brevity)
getfattr
/var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\\uheader.08f7fa43a49c7f__head_6C8FC219__4
getfattr: Removing leading '/' from absolute path names
# file:
var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\134uheader.08f7fa43a49c7f__head_6C8FC219__4
user.ceph._
user.ceph._@1
user.ceph._lock.rbd_lock
user.ceph.snapset
user.cephos.spill_out
Is there anything I can do to recover from this situation?
--
Kind regards,
Ben Morrice
______________________________________________________________________
Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com