Re: scrub randomization and load threshold

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 12, 2015 at 4:10 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Thu, 12 Nov 2015, Dan van der Ster wrote:
>> On Thu, Nov 12, 2015 at 2:29 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> > On Thu, 12 Nov 2015, Dan van der Ster wrote:
>> >> Hi,
>> >>
>> >> Firstly, we just had a look at the new
>> >> osd_scrub_interval_randomize_ratio option and found that it doesn't
>> >> really solve the deep scrubbing problem. Given the default options,
>> >>
>> >> osd_scrub_min_interval = 60*60*24
>> >> osd_scrub_max_interval = 7*60*60*24
>> >> osd_scrub_interval_randomize_ratio = 0.5
>> >> osd_deep_scrub_interval = 60*60*24*7
>> >>
>> >> we understand that the new option changes the min interval to the
>> >> range 1-1.5 days. However, this doesn't do anything for the thundering
>> >> herd of deep scrubs which will happen every 7 days. We've found a
>> >> configuration that should randomize deep scrubbing across two weeks,
>> >> e.g.:
>> >>
>> >> osd_scrub_min_interval = 60*60*24*7
>> >> osd_scrub_max_interval = 100*60*60*24 // effectively disabling this option
>> >> osd_scrub_load_threshold = 10 // effectively disabling this option
>> >> osd_scrub_interval_randomize_ratio = 2.0
>> >> osd_deep_scrub_interval = 60*60*24*7
>> >>
>> >> but that (a) doesn't allow shallow scrubs to run daily and (b) is so
>> >> far off the defaults that its basically an abuse of the intended
>> >> behaviour.
>> >>
>> >> So we'd like to simplify how deep scrubbing can be randomized. Our PR
>> >> (http://github.com/ceph/ceph/pull/6550) adds a new option
>> >> osd_deep_scrub_randomize_ratio which  controls a coin flip to randomly
>> >> turn scrubs into deep scrubs. The default is tuned so roughly 1 in 7
>> >> scrubs will be run deeply.
>> >
>> > The coin flip seems reasonable to me.  But wouldn't it also/instead make
>> > sense to apply the randomize ratio to the deep_scrub_interval?  My just
>> > adding in the random factor here:
>> >
>> > https://github.com/ceph/ceph/pull/6550/files#diff-dfb9ddca0a3ee32b266623e8fa489626R3247
>> >
>> > That is what I would have expected to happen, and if the coin flip is also
>> > there then you have two knobs controlling the same thing, which'll cause
>> > confusion...
>> >
>>
>> That was our first idea. But that has a couple downsides:
>>
>>   1.  If we use the random range for the deep scrub intervals, e.g.
>> deep every 1-1.5 weeks, we still get quite bursty scrubbing until it
>> randomizes over a period of many weeks/months. And I fear it might
>> even lead to lower frequency harmonics of many concurrent deep scrubs.
>> Using a coin flip guarantees uniformity starting immediately from time
>> zero.
>>
>>   2. In our PR osd_deep_scrub_interval is still used as an upper limit
>> on how long a PG can go without being deeply scrubbed. This way
>> there's no confusion such as PGs going undeep-scrubbed longer than
>> expected. (In general, I think this random range is unintuitive and
>> difficult to tune (e.g. see my 2 week deep scrubbing config above).
>
> Fair enough..
>
>> For me, the most intuitive configuration (maintaining randomness) would be:
>>
>>   a. drop the osd_scrub_interval_randomize_ratio because there is no
>> shallow scrub thundering herd problem (AFAIK), and it just complicates
>> the configuration. (But this is in a stable release now so I don't
>> know if you want to back it out).
>
> I'm inclined to leave it, even if it complicates config: just because we
> haven't noticed the shallow scrub thundering herd doesn't mean it doesn't
> exist, and I fully expect that it is there.  Also, if the shallow scrubs
> are lumpy and we're promoting some of them to deep scrubs, then the deep
> scrubs will be lumpy too.
>

Sounds good.

>>   b. perform a (usually shallow) scrub every
>> osd_scrub_interval_(min/max) depending on a self-tuning load
>> threshold.
>
> Yep, although as you note we have some work to do to get there.  :)
>
>>   c. do a coin flip each (b) to occasionally turn it into deep scrub.
>
> Works for me.
>
>>   optionally: d. remove osd_deep_scrub_randomize_ratio and replace it
>> with  osd_scrub_interval_min/osd_deep_scrub_interval.
>
> There is no osd_deep_scrub_randomize_ratio.  Do you mean replace
> osd_deep_scrub_interval with osd_deep_scrub_{min,max}_interval?

osd_deep_scrub_randomize_ratio is the new option we proposed in the
PR. We chose 0.15 because it's roughly 1/7 (i.e.
osd_scrub_interval_min/osd_deep_scrub_interval = 1/7 in the default
config). But the coin flip could use
osd_scrub_interval_min/osd_deep_scrub_interval instead of adding this
extra configurable.

My preference would be to keep it separately configurable.

>> >> Secondly, we'd also like to discuss the osd_scrub_load_threshold
>> >> option, where we see two problems:
>> >>    - the default is so low that it disables all the shallow scrub
>> >> randomization on all but completely idle clusters.
>> >>    - finding the correct osd_scrub_load_threshold for a cluster is
>> >> surely unclear/difficult and probably a moving target for most prod
>> >> clusters.
>> >>
>> >> Given those observations, IMHO the smart Ceph admin should set
>> >> osd_scrub_load_threshold = 10 or higher, to effectively disable that
>> >> functionality. In the spirit of having good defaults, I therefore
>> >> propose that we increase the default osd_scrub_load_threshold (to at
>> >> least 5.0) and consider removing the load threshold logic completely.
>> >
>> > This sounds reasonable to me.  It would be great if we could use a 24-hour
>> > average as the baseline or something so that it was self-tuning (e.g., set
>> > threshold to .8 of daily average), but that's a bit trickier.  Generally
>> > all for self-tuning, though... too many knobs...
>>
>> Yes, but we probably would need to make your 0.8 a function of the
>> stddev of the loadavg over a day, to handle clusters with flat
>> loadavgs as well as varying ones.
>>
>> In order to randomly spread the deep scrubs across the week, it's
>> essential to give each PG many opportunities to scrub throughout the
>> week. If PGs are only shallow scrubbed once a week (at interval_max),
>> then every scrub would become a deep scrub and we again have the
>> thundering herd problem.
>>
>> I'll push 5.0 for now.
>
> Sounds good.
>
> I would still love to see someone tackle the auto-tuning approach,
> though! :)

I should have some time next week to have a look, if nobody beat me to it.

-- dan

> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux