Hello all, We have been running RADOS in a large scale, production, public cloud environment for a few months now and we are generally happy with it. However, we experience performance problems when deep scrubbing is active. We managed to reproduce them in our testing cluster running emperor, even while it was idle. We ran a simple rados bench test: rados -p bench bench -b 524288 120 write and could easily reach 230MB/Sec consistently [1]. Then, we manually initiated a deep scrub and re-ran the test. As you can see from the results [2], the performance dropped significantly and even paused for a few seconds. Now imagine that behavior in a loaded cluster with thousands of VMs on top of it. The performance drop is unacceptable for our service. Are there any tools we are not aware of for controlling, possibly pausing, deep-scrub and/or getting some progress about the procedure ? Also since I believe it would be a bad practice to disable deep-scrubbing do you have any recommendations of how to work around (or even solve) this issue ? [1] https://pithos.okeanos.grnet.gr/public/yzq5fHNkl5OnjgLOPlRTA3 [2] https://pithos.okeanos.grnet.gr/public/OjIGAQFBGwcsBNMHtA8ir5 Kind Regards, -- Filippos <philipgian@xxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html