My write-heavy cluster struggles under the additional load created by deep-scrub from time to time. As I have instrumented the cluster more, it has become clear that there is something I cannot explain happening in the scheduling of PGs to undergo deep-scrub. Please refer to these images [0][1] to see two graphical representations of how deep-scrub goes awry in my cluster. These were two separate incidents. Both show a period of "happy" scrub and deep-scrubs and stable writes/second across the cluster, then an approximately 5x jump in concurrent deep-scrubs where client IO is cut by nearly 50%. The first image (deep-scrub-issue1.jpg) shows a happy cluster with low numbers of scrub and deep-scrub running until about 10pm, then something triggers deep-scrubs to increase about 5x and remain high until I manually 'ceph osd set nodeep-scrub' at approx 10am. During the time of higher concurrent deep-scrubs, IOPS drop significantly due to OSD spindle contention preventing qemu/rbd clients from writing like normal. The second image (deep-scrub-issue2.jpg) shows a similar approx 5x jump in concurrent deep-scrubs and associated drop in writes/second. This image also adds a summary of the 'dump historic ops' which show the to be expected jump in the slowest ops in the cluster. Does anyone have an idea of what is happening when the spike in concurrent deep-scrub occurs and how to prevent the adverse effects, outside of disabling deep-scrub permanently? 0: http://www.mikedawson.com/deep-scrub-issue1.jpg 1: http://www.mikedawson.com/deep-scrub-issue2.jpg Thanks, Mike Dawson