Hi, Our cluster has an SSD pool that contains empty objects with about 100k OMAP keys each (similar to the rgw index pool). If we restart one of the associated SSD OSDs while writing just a few OMAP keys to the cluster, I've noticed that PGs take a very long time to recover, and `ceph status` shows 200k+ keys/s being recovered, despite only maybe a couple thousands new keys having been created. ``` recovery: 0 B/s, 268.65k keys/s, 2 objects/s ``` What seems to be happening (but I'd love confirmation on that from a developer) is that any PG that was "tainted" while the OSD was restarting get marked for recovery, and then instead of just adding the missing keys, existing keys are deleted and recreated. What makes me think that a large number of keys are being deleted is that we're affected by https://tracker.ceph.com/issues/55324 (we're still running 16.2.7), and after the recovery finishes, we do see slow ops caused by tombstones, and the only way to fix it is to compact the OSD. Can someone confirm that it's really what's happening? Is this the expected behavior, or is there a way to make OMAP recovery more efficient? Cheers, -- Ben _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx