Re: scrub scheduling

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 9 Feb 2015 07:41:12 -0800

On Sun, Feb 8, 2015 at 1:38 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> Simon Leinen at Switch did a greaet post recently about the impact of
> scrub on their cluster(s):
>
>         http://blog.simon.leinen.ch/2015/02/ceph-deep-scrubbing-impact.html
>
> Basically the 2 week deep scrub interval kicks in on exactly a 2 week
> cycle and the cluster goes crazy for a few hours and then does nothing
> (but client IO) for the next two weeks.
>
> The options governing this are:
>
> OPTION(osd_scrub_min_interval, OPT_FLOAT, 60*60*24)    // if load is low
> OPTION(osd_scrub_max_interval, OPT_FLOAT, 7*60*60*24)  // regardless of load
> OPTION(osd_deep_scrub_interval, OPT_FLOAT, 60*60*24*7) // once a week
> OPTION(osd_scrub_load_threshold, OPT_FLOAT, 0.5)
>
> That is, if the load is < .5 (probably almost never on a real cluster) it
> will scrub every day, otherwise (regardless of load) it will scrub each PG
> at least once a week.
>
> Several things we can do here:
>
> 1- Maybe the shallow scrub interval should be less than the deep scrub
> interval?
>
> 2- There is a new feature for hammer that limits scrub to certain times of
> day, contributed by Xinze Chi:
>
> OPTION(osd_scrub_begin_hour, OPT_INT, 0)
> OPTION(osd_scrub_end_hour, OPT_INT, 24)
>
> That is, by default, scrubs can happen at any time.  You can use this to
> limit to certain hour sof the night, or whatever is appropriate for your
> cluster.  That only sort of helps, though; Simon's scrub frenzy will still
> happen one day a week, all at once (or maybe spread over 2 nights).
>
> 3- We can spread them out during the allowed window.  But how to do that?
> We could make the scrub interval randomly +/- a value of up to 50% of the
> total interval.  Or we could somehow look at the current rate of scrubbing
> (average time to completeion for the current pool, maybe).. any look at
> the total number of items in the scrub queue?
>
> 4- Ric pointed out to me that even if we spread these out, scrubbing at
> full speed has an impact.  Even if we do all the prioritization magic we
> can there will still be a buch of large IOs in the queue.  What if we have
> a hard throttle on the scrub rate, objects per second and/or bytes per
> second?  In the end the same number of IOs traverse the queue and
> potentially interfere with client IO, but they would be spread out over a
> longer period of time and be less noticeable (i.e., slow down client IOs
> from different workloads and not all the same workload).  I'm not totally
> convinced this is an improvement over a strategy where we have only 1
> scrub IO in flight at all times, but that isn't quite how scrub schedules
> itself so it's hard to compare it that way, and in the end the user
> experience and perceived impact should be lower...
>
> 5- Auto-adjust the above scrub rate based on the total amount of data,
> scrub interval, and scrub hours so that we are scrubbing at the slowest
> rate possible that meets the schedule.  We'd have to be slightly clever to
> have the right feedback in place here...

Right. Fundamentally what we're trying to do here is schedule N IOs
against a cluster within time period T, ideally without impacting any
of the client IOs issued against the cluster. Depending on how many
client IOs there are, and when they come in, that might be easy or
might be imposslble (because they're using up all the IO capacity in
the cluster themselves).

So scheduling options can help by directing the scrubbing to occur at
times when we don't expect client IO (at night or whatever), but in
the general case I don't think we can actually solve it. What we can
do is:
1) Try and figure out how much IO is required to do scrubbing, and
alert users if their configuration won't succeed,
2) Prioritize scrubbing traffic against client IO more effectively
than we do right now.

I think (2) been on Sam's list of things to do for a while now, by
making scrub ops into normal operations that go through a shared work
queue with priority attached: http://tracker.ceph.com/issues/8635
There's not a lot of detail there unfortunately, but if scrubbing was
a regular operation it would solve many of the conflicts:
* low priority would mean that instead of being 1 IO at a time, it
would get time roughly proportional to its priority
* scheduling would make it less likely to hit conflicts, and mean that
in future we could even get clever about avoiding or backing off scrub
on something in use by clients
* it more naturally spreads out the scrub workload and makes it
quickly apparent if the requested scrub rate is unsustainable (just
track the rate of scrub completions against what we'd need it to be
for success)

I think that's probably a useful first step, and I'm pretty sure the
general case doesn't have a closed-form solution so I'm leery of
trying to build up a big system before it's in place.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html