Inconsistent PGs caused by omap_digest mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have two separate RGW clusters running Luminous (12.2.8) that have started seeing an increase in PGs going active+clean+inconsistent with the reason being caused by an omap_digest mismatch.  Both clusters are using FileStore and the inconsistent PGs are happening on the .rgw.buckets.index pool which was moved from HDDs to SSDs within the last few months.

We've been repairing them by first making sure the odd omap_digest is not the primary by setting the primary-affinity to 0 if needed, doing the repair, and then setting the primary-affinity back to 1.

For example PG 7.3 went inconsistent earlier today:

# rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | .errors, .shards'
[
  "omap_digest_mismatch"
]
[
  {
    "osd": 504,
    "primary": true,
    "errors": [],
    "size": 0,
    "omap_digest": "0x4c10ee76",
    "data_digest": "0xffffffff"
  },
  {
    "osd": 525,
    "primary": false,
    "errors": [],
    "size": 0,
    "omap_digest": "0x26a1241b",
    "data_digest": "0xffffffff"
  },
  {
    "osd": 556,
    "primary": false,
    "errors": [],
    "size": 0,
    "omap_digest": "0x26a1241b",
    "data_digest": "0xffffffff"
  }
]

Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would set the primary-affinity to 0 with:

# ceph osd primary-affinity osd.504 0

Do the repair:

# ceph pg repair 7.3

And then once the repair is complete we would set the primary-affinity back to 1 on osd.504:

# ceph osd primary-affinity osd.504 1

There doesn't appear to be any correlation between the OSDs which would point to a hardware issue, and since it's happening on two different clusters I'm wondering if there's a race condition that has been fixed in a later version?

Also, what exactly is the omap digest?  From what I can tell it appears to be some kind of checksum for the omap data.  Is that correct?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux