Re: ceph pg inconsistencies - omap data lost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 4, 2017 at 7:09 AM, Ben Morrice <ben.morrice@xxxxxxx> wrote:
> Hi all,
>
> We have a weird issue with a few inconsistent PGs. We are running ceph 11.2
> on RHEL7.
>
> As an example inconsistent PG we have:
>
> # rados -p volumes list-inconsistent-obj 4.19
> {"epoch":83986,"inconsistents":[{"object":{"name":"rbd_header.08f7fa43a49c7f","nspace":"","locator":"","snap":"head","version":28785242},"errors":[],"union_shard_errors":["omap_digest_mismatch_oi"],"selected_object_info":"4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
> client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd
> ffffffff od ffffffff alloc_hint [0 0
> 0])","shards":[{"osd":10,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":20,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"},{"osd":29,"errors":["omap_digest_mismatch_oi"],"size":0,"omap_digest":"0x62b5dcb6","data_digest":"0xffffffff"}]}]}
>
> If I try to repair this PG, I get the following in the OSD logs:
>
> 2017-04-04 14:31:37.825833 7f2d7f802700 -1 log_channel(cluster) log [ERR] :
> 4.19 shard 10: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest
> 0x62b5dcb6 != omap_digest 0xffffffff from auth oi
> 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
> client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd
> ffffffff od ffffffff alloc_hint [0 0 0])
> 2017-04-04 14:31:37.825863 7f2d7f802700 -1 log_channel(cluster) log [ERR] :
> 4.19 shard 20: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest
> 0x62b5dcb6 != omap_digest 0xffffffff from auth oi
> 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
> client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd
> ffffffff od ffffffff alloc_hint [0 0 0])
> 2017-04-04 14:31:37.825870 7f2d7f802700 -1 log_channel(cluster) log [ERR] :
> 4.19 shard 29: soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head omap_digest
> 0x62b5dcb6 != omap_digest 0xffffffff from auth oi
> 4:9843f136:::rbd_header.08f7fa43a49c7f:head(82935'28785242
> client.118028302.0:3057684 dirty|data_digest|omap_digest s 0 uv 28785242 dd
> ffffffff od ffffffff alloc_hint [0 0 0])
> 2017-04-04 14:31:37.825877 7f2d7f802700 -1 log_channel(cluster) log [ERR] :
> 4.19 soid 4:9843f136:::rbd_header.08f7fa43a49c7f:head: failed to pick
> suitable auth object
> 2017-04-04 14:32:37.926980 7f2d7cffd700 -1 log_channel(cluster) log [ERR] :
> 4.19 deep-scrub 3 errors
>
> If I list the omapvalues, they are null
>
> # rados -p volumes listomapvals rbd_header.08f7fa43a49c7f |wc -l
> 0
>
>
> If I list the extended attributes on the filesystem of each OSD that hosts
> this file, they are indeed empty (all 3 OSDs are the same, but just listing
> one for brevity)
>
> getfattr
> /var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\\uheader.08f7fa43a49c7f__head_6C8FC219__4
> getfattr: Removing leading '/' from absolute path names
> # file:
> var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\134uheader.08f7fa43a49c7f__head_6C8FC219__4
> user.ceph._
> user.ceph._@1
> user.ceph._lock.rbd_lock
> user.ceph.snapset
> user.cephos.spill_out
>
>
> Is there anything I can do to recover from this situation?

This is probably late, but for future reference, you can use the
ceph-objectstore tool running against local OSDs to examine their
specific state (as opposd to the rados listomapvals command, which
just looks at the primary). If you have a valid replica, you generally
just use that tool to delete the primary's copy of the object and copy
it over from the replicas, or run a repair which does it for you.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux