ceph pg inconsistencies - omap data lost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

We have a weird issue with a few inconsistent PGs. We are running ceph 11.2 on RHEL7.

As an example inconsistent PG we have:

# rados -p volumes list-inconsistent-obj 4.19
{"epoch":83986,"inconsistents":[{"object":{"name":"rbd_header.08f7fa43a49c7f","nspace":"","locator":"","snap":"head","version":28785242},"errors":[],"union_shard_errors":["omap_digest_mismatch_oi"],"selected_object_info":"4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd ffffffff od ffffffff alloc_hint [0 0 0])","shards":[{"osd":10,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":20,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":29,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"}]}]}

If I try to repair this PG, I get the following in the OSD logs:

2017-04-04 14:31:37.825833 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 shard 10: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest 0x62b5dcb6 != omap_digest 0xffffffff from auth oi 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd ffffffff od ffffffff alloc_hint [0 0 0]) 2017-04-04 14:31:37.825863 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 shard 20: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest 0x62b5dcb6 != omap_digest 0xffffffff from auth oi 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd ffffffff od ffffffff alloc_hint [0 0 0]) 2017-04-04 14:31:37.825870 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 shard 29: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest 0x62b5dcb6 != omap_digest 0xffffffff from auth oi 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242 client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd ffffffff od ffffffff alloc_hint [0 0 0]) 2017-04-04 14:31:37.825877 7f2d7f802700 -1 log_channel(cluster) log [ERR] : 4.19 soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head: failed to pick suitable auth object 2017-04-04 14:32:37.926980 7f2d7cffd700 -1 log_channel(cluster) log [ERR] : 4.19 deep-scrub 3 errors

If I list the omapvalues, they are null

# rados -p volumes listomapvals rbd_header.08f7fa43a49c7f |wc -l
0


If I list the extended attributes on the filesystem of each OSD that hosts this file, they are indeed empty (all 3 OSDs are the same, but just listing one for brevity)

getfattr /var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\\uheader.08f7fa43a49c7f__head_6C8FC219__4
getfattr: Removing leading '/' from absolute path names
# file: var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\134uheader.08f7fa43a49c7f__head_6C8FC219__4
user.ceph._
user.ceph._@1
user.ceph._lock.rbd_lock
user.ceph.snapset
user.cephos.spill_out


Is there anything I can do to recover from this situation?


--
Kind regards,

Ben Morrice

______________________________________________________________________
Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux