Simon Leinen at Switch did a greaet post recently about the impact of scrub on their cluster(s): http://blog.simon.leinen.ch/2015/02/ceph-deep-scrubbing-impact.html Basically the 2 week deep scrub interval kicks in on exactly a 2 week cycle and the cluster goes crazy for a few hours and then does nothing (but client IO) for the next two weeks. The options governing this are: OPTION(osd_scrub_min_interval, OPT_FLOAT, 60*60*24) // if load is low OPTION(osd_scrub_max_interval, OPT_FLOAT, 7*60*60*24) // regardless of load OPTION(osd_deep_scrub_interval, OPT_FLOAT, 60*60*24*7) // once a week OPTION(osd_scrub_load_threshold, OPT_FLOAT, 0.5) That is, if the load is < .5 (probably almost never on a real cluster) it will scrub every day, otherwise (regardless of load) it will scrub each PG at least once a week. Several things we can do here: 1- Maybe the shallow scrub interval should be less than the deep scrub interval? 2- There is a new feature for hammer that limits scrub to certain times of day, contributed by Xinze Chi: OPTION(osd_scrub_begin_hour, OPT_INT, 0) OPTION(osd_scrub_end_hour, OPT_INT, 24) That is, by default, scrubs can happen at any time. You can use this to limit to certain hour sof the night, or whatever is appropriate for your cluster. That only sort of helps, though; Simon's scrub frenzy will still happen one day a week, all at once (or maybe spread over 2 nights). 3- We can spread them out during the allowed window. But how to do that? We could make the scrub interval randomly +/- a value of up to 50% of the total interval. Or we could somehow look at the current rate of scrubbing (average time to completeion for the current pool, maybe).. any look at the total number of items in the scrub queue? 4- Ric pointed out to me that even if we spread these out, scrubbing at full speed has an impact. Even if we do all the prioritization magic we can there will still be a buch of large IOs in the queue. What if we have a hard throttle on the scrub rate, objects per second and/or bytes per second? In the end the same number of IOs traverse the queue and potentially interfere with client IO, but they would be spread out over a longer period of time and be less noticeable (i.e., slow down client IOs from different workloads and not all the same workload). I'm not totally convinced this is an improvement over a strategy where we have only 1 scrub IO in flight at all times, but that isn't quite how scrub schedules itself so it's hard to compare it that way, and in the end the user experience and perceived impact should be lower... 5- Auto-adjust the above scrub rate based on the total amount of data, scrub interval, and scrub hours so that we are scrubbing at the slowest rate possible that meets the schedule. We'd have to be slightly clever to have the right feedback in place here... Thoughts? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html