On Thu, 12 Nov 2015, Dan van der Ster wrote: > On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Thu, 12 Nov 2015, Dan van der Ster wrote: > >> Hi, > >> > >> Firstly, we just had a look at the new > >> osd_scrub_interval_randomize_ratio option and found that it doesn't > >> really solve the deep scrubbing problem. Given the default options, > >> > >> osd_scrub_min_interval = 60*60*24 > >> osd_scrub_max_interval = 7*60*60*24 > >> osd_scrub_interval_randomize_ratio = 0.5 > >> osd_deep_scrub_interval = 60*60*24*7 > >> > >> we understand that the new option changes the min interval to the > >> range 1-1.5 days. However, this doesn't do anything for the thundering > >> herd of deep scrubs which will happen every 7 days. We've found a > >> configuration that should randomize deep scrubbing across two weeks, > >> e.g.: > >> > >> osd_scrub_min_interval = 60*60*24*7 > >> osd_scrub_max_interval = 100*60*60*24 // effectively disabling this option > >> osd_scrub_load_threshold = 10 // effectively disabling this option > >> osd_scrub_interval_randomize_ratio = 2.0 > >> osd_deep_scrub_interval = 60*60*24*7 > >> > >> but that (a) doesn't allow shallow scrubs to run daily and (b) is so > >> far off the defaults that its basically an abuse of the intended > >> behaviour. > >> > >> So we'd like to simplify how deep scrubbing can be randomized. Our PR > >> (http://github.com/ceph/ceph/pull/6550) adds a new option > >> osd_deep_scrub_randomize_ratio which controls a coin flip to randomly > >> turn scrubs into deep scrubs. The default is tuned so roughly 1 in 7 > >> scrubs will be run deeply. > > > > The coin flip seems reasonable to me. But wouldn't it also/instead make > > sense to apply the randomize ratio to the deep_scrub_interval? My just > > adding in the random factor here: > > > > https://github.com/ceph/ceph/pull/6550/files#diff-dfb9ddca0a3ee32b266623e8fa489626R3247 > > > > That is what I would have expected to happen, and if the coin flip is also > > there then you have two knobs controlling the same thing, which'll cause > > confusion... > > > > That was our first idea. But that has a couple downsides: > > 1. If we use the random range for the deep scrub intervals, e.g. > deep every 1-1.5 weeks, we still get quite bursty scrubbing until it > randomizes over a period of many weeks/months. And I fear it might > even lead to lower frequency harmonics of many concurrent deep scrubs. > Using a coin flip guarantees uniformity starting immediately from time > zero. > > 2. In our PR osd_deep_scrub_interval is still used as an upper limit > on how long a PG can go without being deeply scrubbed. This way > there's no confusion such as PGs going undeep-scrubbed longer than > expected. (In general, I think this random range is unintuitive and > difficult to tune (e.g. see my 2 week deep scrubbing config above). Fair enough.. > For me, the most intuitive configuration (maintaining randomness) would be: > > a. drop the osd_scrub_interval_randomize_ratio because there is no > shallow scrub thundering herd problem (AFAIK), and it just complicates > the configuration. (But this is in a stable release now so I don't > know if you want to back it out). I'm inclined to leave it, even if it complicates config: just because we haven't noticed the shallow scrub thundering herd doesn't mean it doesn't exist, and I fully expect that it is there. Also, if the shallow scrubs are lumpy and we're promoting some of them to deep scrubs, then the deep scrubs will be lumpy too. > b. perform a (usually shallow) scrub every > osd_scrub_interval_(min/max) depending on a self-tuning load > threshold. Yep, although as you note we have some work to do to get there. :) > c. do a coin flip each (b) to occasionally turn it into deep scrub. Works for me. > optionally: d. remove osd_deep_scrub_randomize_ratio and replace it > with osd_scrub_interval_min/osd_deep_scrub_interval. There is no osd_deep_scrub_randomize_ratio. Do you mean replace osd_deep_scrub_interval with osd_deep_scrub_{min,max}_interval? > >> Secondly, we'd also like to discuss the osd_scrub_load_threshold > >> option, where we see two problems: > >> - the default is so low that it disables all the shallow scrub > >> randomization on all but completely idle clusters. > >> - finding the correct osd_scrub_load_threshold for a cluster is > >> surely unclear/difficult and probably a moving target for most prod > >> clusters. > >> > >> Given those observations, IMHO the smart Ceph admin should set > >> osd_scrub_load_threshold = 10 or higher, to effectively disable that > >> functionality. In the spirit of having good defaults, I therefore > >> propose that we increase the default osd_scrub_load_threshold (to at > >> least 5.0) and consider removing the load threshold logic completely. > > > > This sounds reasonable to me. It would be great if we could use a 24-hour > > average as the baseline or something so that it was self-tuning (e.g., set > > threshold to .8 of daily average), but that's a bit trickier. Generally > > all for self-tuning, though... too many knobs... > > Yes, but we probably would need to make your 0.8 a function of the > stddev of the loadavg over a day, to handle clusters with flat > loadavgs as well as varying ones. > > In order to randomly spread the deep scrubs across the week, it's > essential to give each PG many opportunities to scrub throughout the > week. If PGs are only shallow scrubbed once a week (at interval_max), > then every scrub would become a deep scrub and we again have the > thundering herd problem. > > I'll push 5.0 for now. Sounds good. I would still love to see someone tackle the auto-tuning approach, though! :) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html