Hi Aaron,
The data_log objects are storing logs for multisite replication. Judging
by the pool name '.us-phx2.log', this cluster was created before jewel.
Are you (or were you) using multisite or radosgw-agent?
If not, you'll want to turn off the logging (log_meta and log_data ->
false) in your zonegroup configuration using 'radosgw-admin zonegroup
get/set', restart gateways, then delete the data_log and meta_log objects.
If it is multisite, then the logs should all be trimmed in the
background as long as all peer zones are up-to-date. There was a bug
prior to 12.2.12 that prevented datalog trimming
(http://tracker.ceph.com/issues/38412).
Casey
On 6/11/19 5:41 PM, Aaron Bassett wrote:
Hey all,
I've just recently upgraded some of my larger rgw clusters to latest luminous and now I'm getting a lot of warnings about large omap objects. Most of them were on the indices and I've taken care of them by sharding where appropriate. However on two of my clusters I have a large object in the rgw log pool.
ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.us-phx2.log'
Search the cluster log for 'Large omap object found' for more details.
2019-06-11 10:50:04.583354 7f8d2b737700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 51:b9a904f6:::data_log.27:head Key count: 15903755 Size (bytes): 2305116273
I'm not sure what to make of this. I don't see much chatter on the mailing lists about the log pool, other than a thread about swift lifecycles, which I dont use. The log pool is pretty large, making it difficult to poke around in there:
.us-phx2.log 51 118GiB 0.03 384TiB 12782413
That said i did a little poking around and it looks like a mix of these data_log object and some delete hints, but mostly a lot of objects starting with dates that point to different s3 pools. The object referenced in the osd log has 15912300 omap keys, and spot checking it, it looks like it's mostly referencing a pool we use with out dns resolver. We have a dns service that checks rgw endpoint health by uploading and deleting an object every few minutes to check health, and adds/removes endpoints from the A record as indicated.
So I guess I've got a few questions:
1) what is the nature of the data in the data_log.* objects in the log pool? Is it safe to remove or is it more like a binlog that needs to be intact from the beginning of time?
2) with the log pool in general, beyond the individual objects omap sizes, is there any concern about size? If so, is there a way to force it to truncate? I see some log commands in radosgw-admin, but documentation is light.
Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com