Re: Large Omap Warning on Log pool

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 12 Jun 2019 11:08:47 -0400

Hi Aaron,

The data_log objects are storing logs for multisite replication. Judging 
by the pool name '.us-phx2.log', this cluster was created before jewel. 
Are you (or were you) using multisite or radosgw-agent?

If not, you'll want to turn off the logging (log_meta and log_data -> 
false) in your zonegroup configuration using 'radosgw-admin zonegroup 
get/set', restart gateways, then delete the data_log and meta_log objects.

If it is multisite, then the logs should all be trimmed in the 
background as long as all peer zones are up-to-date. There was a bug 
prior to 12.2.12 that prevented datalog trimming 
(http://tracker.ceph.com/issues/38412).

Casey

On 6/11/19 5:41 PM, Aaron Bassett wrote:
Hey all,
I've just recently upgraded some of my larger rgw clusters to latest luminous and now I'm getting a lot of warnings about large omap objects. Most of them were on the indices and I've taken care of them by sharding where appropriate. However on two of my clusters I have a large object in the rgw log pool.

ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
     1 large objects found in pool '.us-phx2.log'
     Search the cluster log for 'Large omap object found' for more details.

2019-06-11 10:50:04.583354 7f8d2b737700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 51:b9a904f6:::data_log.27:head Key count: 15903755 Size (bytes): 2305116273

I'm not sure what to make of this. I don't see much chatter on the mailing lists about the log pool, other than a thread about swift lifecycles, which I dont use.  The log pool is pretty large, making it difficult to poke around in there:

.us-phx2.log                             51      118GiB      0.03        384TiB      12782413

That said i did a little poking around and it looks like a mix of these data_log object and some delete hints, but mostly a lot of objects starting with dates that point to different s3 pools. The object referenced in the osd log has 15912300  omap keys, and spot checking it, it looks like it's mostly referencing a pool we use with out dns resolver. We have a dns service that checks rgw endpoint health by uploading and deleting an object every few minutes to check health, and adds/removes endpoints from the A record as indicated.

So I guess I've got a few questions:

1) what is the nature of the data in the data_log.* objects in the log pool? Is it safe to remove or is it more like a binlog that needs to be intact from the beginning of time?

2) with the log pool in general, beyond the individual objects omap sizes, is there any concern about size? If so, is there a way to force it to truncate? I see some log commands in radosgw-admin, but documentation is light.

Thanks,
Aaron

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.

_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com

_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com