On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote: > > We have two separate RGW clusters running Luminous (12.2.8) that have started seeing an increase in PGs going active+clean+inconsistent with the reason being caused by an omap_digest mismatch. Both clusters are using FileStore and the inconsistent PGs are happening on the .rgw.buckets.index pool which was moved from HDDs to SSDs within the last few months. > > We've been repairing them by first making sure the odd omap_digest is not the primary by setting the primary-affinity to 0 if needed, doing the repair, and then setting the primary-affinity back to 1. > > For example PG 7.3 went inconsistent earlier today: > > # rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | .errors, .shards' > [ > "omap_digest_mismatch" > ] > [ > { > "osd": 504, > "primary": true, > "errors": [], > "size": 0, > "omap_digest": "0x4c10ee76", > "data_digest": "0xffffffff" > }, > { > "osd": 525, > "primary": false, > "errors": [], > "size": 0, > "omap_digest": "0x26a1241b", > "data_digest": "0xffffffff" > }, > { > "osd": 556, > "primary": false, > "errors": [], > "size": 0, > "omap_digest": "0x26a1241b", > "data_digest": "0xffffffff" > } > ] > > Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would set the primary-affinity to 0 with: > > # ceph osd primary-affinity osd.504 0 > > Do the repair: > > # ceph pg repair 7.3 > > And then once the repair is complete we would set the primary-affinity back to 1 on osd.504: > > # ceph osd primary-affinity osd.504 1 > > There doesn't appear to be any correlation between the OSDs which would point to a hardware issue, and since it's happening on two different clusters I'm wondering if there's a race condition that has been fixed in a later version? > > Also, what exactly is the omap digest? From what I can tell it appears to be some kind of checksum for the omap data. Is that correct? Yeah; it's just a crc over the omap key-value data that's checked during deep scrub. Same as the data digest. I've not noticed any issues around this in Luminous but I probably wouldn't have, so will have to leave it up to others if there are fixes in since 12.2.8. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com