Re: millions of hex 80 0_0000 omap keys in single index shard for single bucket

Christopher Durham <caduceus42@xxxxxxx> · Thu, 21 Sep 2023 16:21:44 +0000 (UTC)

Hi Casey,
This is indeed a multisite setup. The other side shows that for 

# radosgw-admin sync status
the oldest incremental change not applied is about a minute old, and that is consistent over a number of minutes, always the oldest incremental change a minute or two old.

However:
# radosgw-admin bucket sync status --bucket bucket-in-question
shows a number of shards always behind, although it varies.
The number of objects on each side in that bucket is close, and  to this point I have attributed that to the replication lag. 

One thing that came to mind is that the code that writes to say foo/bar/baz/objects ... 

will often delete the objects quickly after creating them. Perhaps the replication doesn't occur tothe other side before they are deleted? Would that perhaps contribute to this?

Not sure how this relates to the objects ending in '/' though, although they are in the same prefix hierarchy.

To get out of this situation, what do I need to do:
1. radosgw-admin bucket sync init --bucket bucket-in-question on both sides?2. manually delete the 0_0000 objects in rados? (yuk). 

I've done #1 before when I had the other side of a multi site down for awhile before. I have not had that happen in the current situation (link down between sites). 

Thanks for anything you or others can offer.
-Chris

   On Wednesday, September 20, 2023 at 07:33:07 PM MDT, Casey Bodley <cbodley@xxxxxxxxxx> wrote:  

 these keys starting with "<80>0_" appear to be replication log entries
for multisite. can you confirm that this is a multisite setup? is the
'bucket sync status' mostly caught up on each zone? in a healthy
multisite configuration, these log entries would eventually get
trimmed automatically

On Wed, Sep 20, 2023 at 7:08 PM Christopher Durham <caduceus42@xxxxxxx> wrote:
>
> I am using ceph 17.2.6 on Rocky 8.
> I have a system that started giving me large omap object warnings.
>
> I tracked this down to a specific index shard for a single s3 bucket.
>
> rados -p <indexpool> listomapkeys .dir.<zoneid>.bucketid.nn.shardid
> shows over 3 million keys for that shard. There are only about 2
> million objects in the entire bucket according to a listing of the bucket
> and radosgw-admin bucket stats --bucket bucketname. No other shard
> has anywhere near this many index objects. Perhaps it should be noted that this
> shard is the highest numbered shard for this bucket. For a bucket with
> 16 shards, this is shard 15.
>
> If I look at the list of omapkeys generated, there are *many*
> beginning with "<80>0_0000", almost the entire set of the three + million
> keys in the shard. These are index objects in the so-called 'ugly' namespace. The rest ofthey omapkeys appear to be normal.
>
> The 0_0000 after the <80> indicates some sort of 'bucket log index' according to src/cls/rgw/cls_rgw.cc.
> However, using some sed magic previously discussed here, I ran:
>
> rados -p <indexpool> getomapval .dir.<zoneid>.bucketid.nn.shardid --omap-key-file /tmp/key.txt
>
> Where /tmp/key.txt contains only the funny <80>0_0000 key name without a newline
>
> The output of this shows, in a hex dump, the object name to which the index
> refers, which was at one time a valid object.
>
> However, that object no longer exists in the bucket, and based on expiration policy, was
> previously deleted. Let's say, in the hex dump, that the object was:
>
> foo/bar/baz/object1.bin
>
> The prefix foo/bar/baz/ used to have 32 objects, say foo/bar/baz/{object1.bin, object2.bin, ... }
> An s3api listing shows that those objects no longer exist (and that is OK, as they  were previously deleted).
> BUT, now, there is a weirdo object left in the bucket:
>
> foo/bar/baz/ <- with the slash at the end, and it is an object not a PRE (fix).
>
> All objects under foo/ have a 3 day lifecycle expiration. If I wait(at most) 3 days, the weirdo object with '/'
> at the end will be deleted, or I can delete it manually using aws s3api. But either way, the log index
> objects, <80>0_0000.... remain.
>
> The bucket in question is heavily used. But with over 3 million of these <80>0_0000 objects (and growing)
> in a single shard, I am currently at a loss as to what to do or how to stop this from occuring.
> I've poked around at a few other buckets, and I found a few others that have this problem, but not enoughto cause a large omap warning. (A few hundred <80>0_000.... index objects in a shard), no where near enoughto cause the large omap warning that led me to this post.
>
> Any ideas?
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx