Re: radosgw: scrub causing slow requests in the md log

Casey Bodley <cbodley@xxxxxxxxxx> · Thu, 15 Jun 2017 13:56:43 -0400

On 06/14/2017 05:59 AM, Dan van der Ster wrote:
Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

  1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
it off when not using multisite?

It's a good idea to turn that off, yes.

First, make sure that you have configured a default realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm <realm name>  (you can 
determine realm name from 'radosgw-admin realm list')
$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default

Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take care 
with future radosgw-admin commands on the zone/zonegroup, as they may 
revert log_meta back to true [1].

  2. I started dumping the output of radosgw-admin mdlog list, and
cancelled it after a few minutes. It had already dumped 3GB of json
and I don't know how much more it would have written. Is something
supposed to be trimming the mdlog automatically?

There is automated mdlog trimming logic in master, but not jewel/kraken. 
And this logic won't be triggered if there is only one zone [2].

  3. ceph df doesn't show the space occupied by omap objects -- is
there an indirect way to see how much space these are using?

You can inspect the osd's omap directory: du -sh 
/var/lib/ceph/osd/osd0/current/omap

  4. mdlog status has markers going back to 2016-10, see [2]. I suppose
we're not using this feature correctly? :-/

  5. Suppose I were to set log_meta: false -- how would I delete these
log entries now that they are not needed?

There is a 'radosgw-admin mdlog trim' command that can be used to trim 
them one --shard-id (from 0 to 63) at a time. An entire log shard can be 
trimmed with:

$ radosgw-admin mdlog trim --shard-id 0 --period 
8d4fcb63-c314-4f9a-b3b3-0e61719ec258 --end-time 2020-1-1

*However*, there is a risk that bulk operations on large omaps will 
affect cluster health by taking down OSDs. Not only can this bulk 
deletion take long enough to trigger the osd/filestore suicide timeouts, 
the resulting leveldb compaction after deletion is likely to block other 
omap operations and hit the timeouts as well. This seems likely in your 
case, based on the fact that you're already having issues with scrub.

Apologies if there are already good docs about this, which eluded my googling.

Best Regards,
Dan

[1] region get:

{
     "id": "61c0ff1a-4330-405a-9eb1-bb494d4daf82",
     "name": "default",
     "api_name": "default",
     "is_master": "true",
     "endpoints": [],
     "hostnames": [],
     "hostnames_s3website": [],
     "master_zone": "61c59385-085d-4caa-9070-63a3868dccb6",
     "zones": [
         {
             "id": "61c59385-085d-4caa-9070-63a3868dccb6",
             "name": "default",
             "endpoints": [],
             "log_meta": "true",
             "log_data": "false",
             "bucket_index_max_shards": 32,
             "read_only": "false"
         }
     ],
     "placement_targets": [
         {
             "name": "default-placement",
             "tags": []
         },
         {
             "name": "indexless",
             "tags": []
         }
     ],
     "default_placement": "default-placement",
     "realm_id": "552868ad-8898-4afb-a775-911297961cee"
}

[2] mdlog status:

No --period given, using current period=8d4fcb63-c314-4f9a-b3b3-0e61719ec258
[
...
     {
         "marker": "1_1475568296.712634_3.1",
         "last_update": "2016-10-04 08:04:56.712634Z"
     },
...
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] http://tracker.ceph.com/issues/20320
[2] http://tracker.ceph.com/issues/20319
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com