Re: radosgw: scrub causing slow requests in the md log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/14/2017 05:59 AM, Dan van der Ster wrote:
Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

  1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
it off when not using multisite?

It's a good idea to turn that off, yes.

First, make sure that you have configured a default realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm <realm name> (you can determine realm name from 'radosgw-admin realm list')
$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default

Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take care with future radosgw-admin commands on the zone/zonegroup, as they may revert log_meta back to true [1].


  2. I started dumping the output of radosgw-admin mdlog list, and
cancelled it after a few minutes. It had already dumped 3GB of json
and I don't know how much more it would have written. Is something
supposed to be trimming the mdlog automatically?

There is automated mdlog trimming logic in master, but not jewel/kraken. And this logic won't be triggered if there is only one zone [2].


  3. ceph df doesn't show the space occupied by omap objects -- is
there an indirect way to see how much space these are using?

You can inspect the osd's omap directory: du -sh /var/lib/ceph/osd/osd0/current/omap


  4. mdlog status has markers going back to 2016-10, see [2]. I suppose
we're not using this feature correctly? :-/

  5. Suppose I were to set log_meta: false -- how would I delete these
log entries now that they are not needed?

There is a 'radosgw-admin mdlog trim' command that can be used to trim them one --shard-id (from 0 to 63) at a time. An entire log shard can be trimmed with:

$ radosgw-admin mdlog trim --shard-id 0 --period 8d4fcb63-c314-4f9a-b3b3-0e61719ec258 --end-time 2020-1-1

*However*, there is a risk that bulk operations on large omaps will affect cluster health by taking down OSDs. Not only can this bulk deletion take long enough to trigger the osd/filestore suicide timeouts, the resulting leveldb compaction after deletion is likely to block other omap operations and hit the timeouts as well. This seems likely in your case, based on the fact that you're already having issues with scrub.


Apologies if there are already good docs about this, which eluded my googling.

Best Regards,
Dan


[1] region get:

{
     "id": "61c0ff1a-4330-405a-9eb1-bb494d4daf82",
     "name": "default",
     "api_name": "default",
     "is_master": "true",
     "endpoints": [],
     "hostnames": [],
     "hostnames_s3website": [],
     "master_zone": "61c59385-085d-4caa-9070-63a3868dccb6",
     "zones": [
         {
             "id": "61c59385-085d-4caa-9070-63a3868dccb6",
             "name": "default",
             "endpoints": [],
             "log_meta": "true",
             "log_data": "false",
             "bucket_index_max_shards": 32,
             "read_only": "false"
         }
     ],
     "placement_targets": [
         {
             "name": "default-placement",
             "tags": []
         },
         {
             "name": "indexless",
             "tags": []
         }
     ],
     "default_placement": "default-placement",
     "realm_id": "552868ad-8898-4afb-a775-911297961cee"
}

[2] mdlog status:

No --period given, using current period=8d4fcb63-c314-4f9a-b3b3-0e61719ec258
[
...
     {
         "marker": "1_1475568296.712634_3.1",
         "last_update": "2016-10-04 08:04:56.712634Z"
     },
...
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] http://tracker.ceph.com/issues/20320
[2] http://tracker.ceph.com/issues/20319
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux