Hi, For the last few months I've been getting question about people seeing warnings about large OMAP objects after scrubs. I've been digging for a few months (You'll also find multiple threads about this) and it all seemed to trace back to RGW indexes. Resharding didn't clean up old indexes properly which caused the RGW indexes to keep growing and growing in number of Objects. Last week I got a case where a RGW-only cluster running on HDD became unusable slow. OSDs flapping, slow requests, the whole package. (yay!) I traced it down to OSDs sometimes scanning RocksDB (debug bluefs) and the HDD would become 100% busy for a few minutes. Compacting these OSDs could take more then 30 minutes and it helped for a while. This cluster was running 12.2.8 and we upgraded to 12.2.11 to run: $ radosgw-admin reshard stale-instances list > instances.json $ cat instances.json|jq -r '.[]'|wc -l It showed that there we 88k stale Instances. The rgw.buckets.index pool showed 222k objects according to 'ceph df'. So we started the clean up the stale Instances as they are stored in RocksDB mainly. $ radosgw-admin reshard stale-instances rm While this was running OSDs would sometimes start to flap. We had to cancel, compact and restart the rm. After 6 days (!) of rm'ing all the indexes were gone. The index pool went from 222k objects to just 43k objects. We compacted all the OSDs, which now took just 3 minutes, and things are running again properly. As a precaution NVMe devices have been added and using device classes we move the index pool to NVMe-backend OSDs only, but nevertheless, this would have also not worked on NVMe. For some reason RocksDB couldn't handle the tens of millions OMAP entries stored in the OSDs and would start to scan the whole DB. It could be that the 4GB of memory per OSD just was not sufficient to store all the indexes for RocksDB, but I wasn't able to confirm that. This cluster has ~1200 buckets in RGW and had 222k objects prior to the cleanup. I got another call yesterday about a cluster with identical symptoms and that has just 250 buckets, but it has ~700k (!!) objects in the RGW index pool. My advise: Upgrade to 12.2.11 and run the stale-instances list asap and see if you need to rm data. This isn't available in 13.2.4, but should be in 13.2.5, so on Mimic you will need to wait. But this might bite you at some point. I hope I can prevent some admins from having sleepless nights about a Ceph cluster flapping. Wido _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com