Re: radosgw: scrub causing slow requests in the md log

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Wed, 21 Jun 2017 16:16:41 +0200

On 06/14/17 11:59, Dan van der Ster wrote:
> Dear ceph users,
>
> Today we had O(100) slow requests which were caused by deep-scrubbing
> of the metadata log:
>
> 2017-06-14 11:07:55.373184 osd.155
> [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
> deep-scrub starts
> ...
> 2017-06-14 11:22:04.143903 osd.155
> [2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
> request 480.140904 seconds old, received at 2017-06-14
> 11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
> meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
> 0=[] ondisk+write+known_if_redirected e7752) currently waiting for
> scrub
> ...
> 2017-06-14 11:22:06.729306 osd.155
> [2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
> deep-scrub ok

This looks just like my problem in my thread on ceph-devel "another
scrub bug? blocked for > 10240.948831 secs" except that your scrub
eventually finished (mine ran hours before I stopped it manually), and
I'm not using rgw.

Sage commented that it is likely related to snaps being removed at some
point and interacting with scrub.

Restarting the osd that is mentioned there (osd.155 in  your case) will
fix it for now. And tuning scrub changes the way it behaves (defaults
make it happen more rarely than what I had before).

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com