YMMV of course, but the first thing that struck me was the constraint of scrub times. Constraining them to fewer hours can mean that more run in parallel. If you truly have off-hours for client ops (Graphite / Grafana are great for visualizing that) that might make sense, but in my 24x7 OpenStack world, there is little or no off-hour lull, so I let scrubs run all the time. You might also up osd_deep_scrub_interval. The default is one week; I raise that to four weeks as a compromise between aggressive protection and the realities of contention. — Anthony, currently looking for a new Ceph opportunity. >> In our ceph.conf we already have this settings active: >> >> osd max scrubs = 1 >> osd scrub begin hour = 20 >> osd scrub end hour = 7 >> osd op threads = 16 >> osd client op priority = 63 >> osd recovery op priority = 1 >> osd op thread timeout = 5 >> >> osd disk thread ioprio class = idle >> osd disk thread ioprio priority = 7 >> > You're missing the most powerful scrub dampener there is: > osd_scrub_sleep = 0.1 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com