Re: [RGW] Too much index objects and OMAP keys on them

Gilles Mocellin <gilles.mocellin@xxxxxxxxxxxxxx> · Sat, 26 Mar 2022 17:21:57 +0100

Le vendredi 25 mars 2022, 22:40:01 CET David Orman a écrit :
> Hi Gilles,
> 
> Did you ever figure this out? Also, your rados ls output indicates that the
> prod cluster has fewer objects in the index pool than the backup cluster,
> or am I misreading this?
> 
> David

Yes, we have more objects on our backup cluster.

Some news since then :
- Searching around, I'm more and more convinced that we have remaining index objects due to 
automatic resharding and especially, failed auto resharding (we it that bug : https://
tracker.ceph.com/issues/51429[1]
- As it seems that many of theses objects, and the ones with large omap, comes from one big 
bucket, we tried to recreate it :
  * copy to a new temporary bucket
  * recreate the bucket
  * manual shard it to a good value
  * and copy back objects from the temporary bucket

One new thing also, we have configured a second cluster in another site, and activate multi-site 
between them. So, no more dynamic resharding.
Bu, here again, we it a bug concerning multi-tenancy, and replication is not working... https://
tracker.ceph.com/issues/50785[2]

In conclusion, our Octopus cluster still has objects with large omap in our RGW index pool...

> 
> On Wed, Dec 1, 2021 at 4:32 AM Gilles Mocellin <
> 
> gilles.mocellin@xxxxxxxxxxxxxx> wrote:
> > Hello,
> > 
> > We see large omap objects warnings on the RGW bucket index pool.
> > The objects OMAP keys are about objects in one identified big bucket.
> > 
> > Context :
> > =========
> > We use S3 storage for an application, with ~1,5 M objects.
> > 
> > The production cluster is "replicated" with rclone cron jobs on another
> > distant cluster.
> > 
> > We have for the moment only one big bucket (23 shards), but we work on a
> > multi-bucket solution.
> > The problem is not here.
> > 
> > One other important information : the bucket is versioned. We don't
> > really have versions or deleted markers due to the way the application
> > works. It's mainly a way for recovery as we don't have backups, due to
> > the expected storage volume. Versioning + replication should solve most
> > of the restoration use cases.
> > 
> > 
> > First, we don't have large omap objects in the production cluster, only
> > on the replicated / backup one.
> > 
> > Differences between the two clusters :
> > - production is a 5 nodes cluster with SSD for rocksdb+wal, 2To SCSI 10k
> > in RAID0 + battery backed cache.
> > - backup cluster is a 13 nodes cluster without SSD? only 8To HDD with
> > direct HBA
> > 
> > Both clusters use Erasure Coding for the RGW buckets data pool. (3+2 on
> > the production one, 8+2 on the backup one).
> > 
> > Firsts seen facts :
> > ===================
> > 
> > Both cluster have the same number of S3 objects in the main bucket.
> > I've seen that there is 10x more objects in the RGW buckets index pool
> > in the prod cluster than in the backup cluster.
> > On these objects, there is 4x more OMAP keys in the backup cluster.
> > 
> > Example :
> > With rados ls :
> > - 311 objects in defaults.rgw.buckets.index (prod cluster)
> > - 3157 objects in MRS4.rgw.buckets.index (backup cluster)
> > 
> > In the backup cluster, we have 22 objects with more than 200000 OMAP
> > keys, that's why we have a Warning.
> > Searching in the production cluster, I can see around 60000 OMAP keys
> > max on objects.
> > 
> > Root Cause ?
> > ============
> > 
> > It seems we have too much OMAP keys and even too much objects in the
> > index pool of our backup cluster. But Why ? And how to remove the
> > orphans ?
> > 
> > I've already tried :
> > - radosgw-admin bucket check --fix -check-objects (still running)
> > - rgw-orphan-list (but was interrupted last night after 5 hours)
> > 
> > As I understand, the last script will do the reverse of what I need :
> > show objects that don't have indexes pointing on it ?
> > The radosgw-admin bucket check will perhaps rebuild indexes, but will it
> > remove unused ones ?
> > 
> > Workaround ?
> > ============
> > 
> > How can I get rid of the unused index objects and omap keys ?
> > Of course, I can add more reshards, but I think it would be better to
> > solve the root cause if I can.
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx

--------
[1] https://tracker.ceph.com/issues/51429
[2] https://tracker.ceph.com/issues/50785
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx