Response inline: On Fri, Mar 5, 2021 at 11:00 AM Benoît Knecht <bknecht@xxxxxxxxxxxxx> wrote: > > On Friday, March 5th, 2021 at 15:20, Drew Weaver <drew.weaver@xxxxxxxxxx> wrote: > > Sorry to sound clueless but no matter what I search for on El Goog I can't figure out how to answer the question as to whether dynamic sharding is enabled in our environment. > > > > It's not configured as true in the config files, but it is the default. > > > > Is there a radosgw-admin command to determine whether or not it's enabled in the running environment? > > If `rgw_dynamic_resharding` is not explicitly set to `false` in your environment, I think we can assume dynamic resharding is enabled. And if any of your buckets have more than one shard and you didn't reshard them manually, you'll know for sure dynamic resharding is working; you can check the number of shards on a bucket with `radosgw-admin bucket stats --bucket=<name>`, there's a `num_shards` field. You can also check with `radosgw-admin bucket limit check` if any of your buckets are about to be resharded. > > Assuming dynamic resharding is enabled and none of your buckets are about to be resharded, I would then find out which object has too many OMAP keys by grepping the logs. The name of the object will contain the bucket ID (also found in the output of `radosgw-admin bucket stats`), so you'll know which bucket is causing the issue. And you can check how many OMAP keys are in each shard of that bucket index using > > ``` > for obj in $(rados -p default.rgw.buckets.index ls | grep eaf0ece5-9f4a-4aa8-9d67-8c6698f7919b.88726492.4); do > printf "%-60s %7d\n" $obj $(rados -p default.rgw.buckets.index listomapkeys $obj | wc -l) > done > ``` > > (where `eaf0ece5-9f4a-4aa8-9d67-8c6698f7919b.88726492.4` is your bucket ID). If the number of keys are very uneven amongst the shards, there's probably an issue that needs to be addressed. If you they are relatively even but slightly above the warning threshold, it's probably a versioned bucket, and it should be safe to simply increase the threshold. As this is somewhat relevant, jumping in here... we're seeing the same "large omap objects" warning and this is only happening with versioned buckets/objects. Looking through logs, we can find a few instances: cluster 2021-03-29T14:22:12.822291+0000 osd.55 (osd.55) 1074 : cluster [WRN] Large omap object found. Object: 18:7004a547:::.dir.d99b34b6-5e94-4b64-a189-e23a3fabd712.326812.1.10:head PG: 18.e2a5200e (18.e) Key count: 264199 Size (bytes): 107603375 We check the bucket (we do have dynamic sharding enabled): "num_shards": 23, "num_objects": 1524017 Doing the math, something seems off with that key count (23 shards with 1.52 million objects shouldn't be 260k+ a shard). We check: root@ceph01:~# rados -p res22-vbo1a.rgw.buckets.index listomapkeys .dir.d99b34b6-5e94-4b64-a189-e23a3fabd712.326812.1.10 | wc -l 264239 Sure enough, it is more than 200,000, just as the alert indicates. However, why did it not reshard further? Here's the kicker - we _only_ see this with versioned buckets/objects. I don't see anything in the documentation that indicates this is a known issue with sharding, but perhaps there is something going on with versioned buckets/objects. Is there any clarity here/suggestions on how to deal with this? It sounds like you expect this behavior with versioned buckets, so we must be missing something. root@ceph01:~# ceph config get osd rgw_dynamic_resharding true root@ceph01:~# ceph config get osd rgw_max_objs_per_shard 100000 root@ceph01:~# ceph config get osd rgw_max_dynamic_shards 1999 root@ceph01:~# Config should be sharding further based on the key counts in each of the shards. I checked all 23 shards and they are all ~260,000 keys. Thanks, David > Cheers, > > -- > Ben > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx