Le vendredi 25 mars 2022, 22:40:01 CET David Orman a écrit : > Hi Gilles, > > Did you ever figure this out? Also, your rados ls output indicates that the > prod cluster has fewer objects in the index pool than the backup cluster, > or am I misreading this? > > David Yes, we have more objects on our backup cluster. Some news since then : - Searching around, I'm more and more convinced that we have remaining index objects due to automatic resharding and especially, failed auto resharding (we it that bug : https:// tracker.ceph.com/issues/51429[1] - As it seems that many of theses objects, and the ones with large omap, comes from one big bucket, we tried to recreate it : * copy to a new temporary bucket * recreate the bucket * manual shard it to a good value * and copy back objects from the temporary bucket One new thing also, we have configured a second cluster in another site, and activate multi-site between them. So, no more dynamic resharding. Bu, here again, we it a bug concerning multi-tenancy, and replication is not working... https:// tracker.ceph.com/issues/50785[2] In conclusion, our Octopus cluster still has objects with large omap in our RGW index pool... > > On Wed, Dec 1, 2021 at 4:32 AM Gilles Mocellin < > > gilles.mocellin@xxxxxxxxxxxxxx> wrote: > > Hello, > > > > We see large omap objects warnings on the RGW bucket index pool. > > The objects OMAP keys are about objects in one identified big bucket. > > > > Context : > > ========= > > We use S3 storage for an application, with ~1,5 M objects. > > > > The production cluster is "replicated" with rclone cron jobs on another > > distant cluster. > > > > We have for the moment only one big bucket (23 shards), but we work on a > > multi-bucket solution. > > The problem is not here. > > > > One other important information : the bucket is versioned. We don't > > really have versions or deleted markers due to the way the application > > works. It's mainly a way for recovery as we don't have backups, due to > > the expected storage volume. Versioning + replication should solve most > > of the restoration use cases. > > > > > > First, we don't have large omap objects in the production cluster, only > > on the replicated / backup one. > > > > Differences between the two clusters : > > - production is a 5 nodes cluster with SSD for rocksdb+wal, 2To SCSI 10k > > in RAID0 + battery backed cache. > > - backup cluster is a 13 nodes cluster without SSD? only 8To HDD with > > direct HBA > > > > Both clusters use Erasure Coding for the RGW buckets data pool. (3+2 on > > the production one, 8+2 on the backup one). > > > > Firsts seen facts : > > =================== > > > > Both cluster have the same number of S3 objects in the main bucket. > > I've seen that there is 10x more objects in the RGW buckets index pool > > in the prod cluster than in the backup cluster. > > On these objects, there is 4x more OMAP keys in the backup cluster. > > > > Example : > > With rados ls : > > - 311 objects in defaults.rgw.buckets.index (prod cluster) > > - 3157 objects in MRS4.rgw.buckets.index (backup cluster) > > > > In the backup cluster, we have 22 objects with more than 200000 OMAP > > keys, that's why we have a Warning. > > Searching in the production cluster, I can see around 60000 OMAP keys > > max on objects. > > > > Root Cause ? > > ============ > > > > It seems we have too much OMAP keys and even too much objects in the > > index pool of our backup cluster. But Why ? And how to remove the > > orphans ? > > > > I've already tried : > > - radosgw-admin bucket check --fix -check-objects (still running) > > - rgw-orphan-list (but was interrupted last night after 5 hours) > > > > As I understand, the last script will do the reverse of what I need : > > show objects that don't have indexes pointing on it ? > > The radosgw-admin bucket check will perhaps rebuild indexes, but will it > > remove unused ones ? > > > > Workaround ? > > ============ > > > > How can I get rid of the unused index objects and omap keys ? > > Of course, I can add more reshards, but I think it would be better to > > solve the root cause if I can. > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx -------- [1] https://tracker.ceph.com/issues/51429 [2] https://tracker.ceph.com/issues/50785 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx