[RGW] Too much index objects and OMAP keys on them

Gilles Mocellin <gilles.mocellin@xxxxxxxxxxxxxx> · Wed, 01 Dec 2021 11:32:31 +0100

Hello,

We see large omap objects warnings on the RGW bucket index pool.
The objects OMAP keys are about objects in one identified big bucket.

Context :
=========
We use S3 storage for an application, with ~1,5 M objects.

The production cluster is "replicated" with rclone cron jobs on another 
distant cluster.

We have for the moment only one big bucket (23 shards), but we work on a 
multi-bucket solution.
The problem is not here.

One other important information : the bucket is versioned. We don't 
really have versions or deleted markers due to the way the application 
works. It's mainly a way for recovery as we don't have backups, due to 
the expected storage volume. Versioning + replication should solve most 
of the restoration use cases.

First, we don't have large omap objects in the production cluster, only 
on the replicated / backup one.

Differences between the two clusters :
- production is a 5 nodes cluster with SSD for rocksdb+wal, 2To SCSI 10k 
in RAID0 + battery backed cache.
- backup cluster is a 13 nodes cluster without SSD? only 8To HDD with 
direct HBA

Both clusters use Erasure Coding for the RGW buckets data pool. (3+2 on 
the production one, 8+2 on the backup one).

Firsts seen facts :
===================

Both cluster have the same number of S3 objects in the main bucket.
I've seen that there is 10x more objects in the RGW buckets index pool 
in the prod cluster than in the backup cluster.
On these objects, there is 4x more OMAP keys in the backup cluster.

Example :
With rados ls :
- 311 objects in defaults.rgw.buckets.index (prod cluster)
- 3157 objects in MRS4.rgw.buckets.index (backup cluster)

In the backup cluster, we have 22 objects with more than 200000 OMAP 
keys, that's why we have a Warning.
Searching in the production cluster, I can see around 60000 OMAP keys 
max on objects.

Root Cause ?
============

It seems we have too much OMAP keys and even too much objects in the 
index pool of our backup cluster. But Why ? And how to remove the 
orphans ?

I've already tried :
- radosgw-admin bucket check --fix -check-objects (still running)
- rgw-orphan-list (but was interrupted last night after 5 hours)

As I understand, the last script will do the reverse of what I need : 
show objects that don't have indexes pointing on it ?
The radosgw-admin bucket check will perhaps rebuild indexes, but will it 
remove unused ones ?

Workaround ?
============

How can I get rid of the unused index objects and omap keys ?
Of course, I can add more reshards, but I think it would be better to 
solve the root cause if I can.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx