Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

Florian Haas <florian@xxxxxxxxxxxxxx> · Wed, 9 Oct 2019 09:07:12 +0200

Hi,

I am currently dealing with a cluster that's been in use for 5 years and
during that time, has never had its radosgw usage log trimmed. Now that
the cluster has been upgraded to Nautilus (and has completed a full
deep-scrub), it is in a permanent state of HEALTH_WARN because of one
large omap object:

$ ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.usage'

As far as I can tell, there are two thresholds that can trigger that
warning:

* The default omap object size warning threshold,
osd_deep_scrub_large_omap_object_value_sum_threshold, is 1G.

* The default omap object key count warning threshold,
osd_deep_scrub_large_omap_object_key_threshold, is 200000.

In this case, this was the original situation:

osd.6 [WRN] : Large omap object found. Object:
15:169282cd:::usage.20:head Key count: 5834118 Size (bytes): 917351868

So that's 5.8M keys (way above threshold) and 875 MiB total object size
(below threshold, but not by much).

The usage log in this case was no longer needed that far back, so I
trimmed it to keep only the entries from this year (radosgw-admin usage
trim --end-date 2018-12-31), a process that took upward of an hour.

After the trim (and a deep-scrub of the PG in question¹), my situation
looks like this:

osd.6 [WRN] Large omap object found. Object: 15:169282cd:::usage.20:head
Key count: 1185694 Size (bytes): 187061564

So both the key count and the total object size have diminished by about
80%, which is about what you expect when you trim 5 years of usage log
down to 1 year of usage log. However, my key count is still almost 6
times the threshold.

I am aware that I can silence the warning by increasing
osd_deep_scrub_large_omap_object_key_threshold by a factor of 10, but
that's not my question. My question is what I can do to prevent the
usage log from creating such large omap objects in the first place.

Now, there's something else that you should know about this radosgw,
which is that it is configured with the defaults for usage log sharding:

rgw_usage_max_shards = 32
rgw_usage_max_user_shards = 1

... and this cluster's radosgw is pretty much being used by a single
application user. So the fact that it's happy to shard the usage log 32
ways is irrelevant as long as it puts the usage log for one user all
into one shard.

So, I am assuming that if I bump rgw_usage_max_user_shards up to, say,
16 or 32, all *new* usage log entries will be sharded. But I am not
aware of any way to reshard the *existing* usage log. Is there such a
thing?

Otherwise, it seems like the only option in this situation would be to
clear the usage log altogether, and tweak the sharding knobs, which
should at least make the problem not reappear. Or, else, bump
osd_deep_scrub_large_omap_object_key_threshold and just live with the
large object.

Also, is anyone aware of any adverse side effects of increasing these
thresholds, and/or changing the usage log sharding settings, that I
should keep in mind here?

Thanks in advance for your thoughts.

Cheers,
Florian

¹For anyone reading this in the archives because they've run into the
same problem, and wondering how you find out which PGs in a pool have
too-large objects, here's a jq one-liner:

ceph --format=json pg ls-by-pool <poolname> \
  | jq '.pg_stats[]|select(.stat_sum.num_large_omap_objects>0)'
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx