On 06/14/17 11:59, Dan van der Ster wrote: > Dear ceph users, > > Today we had O(100) slow requests which were caused by deep-scrubbing > of the metadata log: > > 2017-06-14 11:07:55.373184 osd.155 > [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d > deep-scrub starts > ... > 2017-06-14 11:22:04.143903 osd.155 > [2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow > request 480.140904 seconds old, received at 2017-06-14 > 11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d > meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc > 0=[] ondisk+write+known_if_redirected e7752) currently waiting for > scrub > ... > 2017-06-14 11:22:06.729306 osd.155 > [2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d > deep-scrub ok This looks just like my problem in my thread on ceph-devel "another scrub bug? blocked for > 10240.948831 secs" except that your scrub eventually finished (mine ran hours before I stopped it manually), and I'm not using rgw. Sage commented that it is likely related to snaps being removed at some point and interacting with scrub. Restarting the osd that is mentioned there (osd.155 in your case) will fix it for now. And tuning scrub changes the way it behaves (defaults make it happen more rarely than what I had before). -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx Internet: http://www.brockmann-consult.de -------------------------------------------- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com