RE: scrub scheduling

Sage Weil <sweil@xxxxxxxxxx> · Mon, 9 Feb 2015 02:24:33 -0800 (PST)

On Mon, 9 Feb 2015, GuangYang wrote:
> Hi Sage,
> Another potential problem with scrub scheduling, as observed in our 
> deployment (2PB cluster, 70% full), was that some PGs hadn't been 
> scrubbed for 1.5 months, even we have the configuration to do deep 
> scrubbing weekly.
> 
> With our deployment and percentage of full of the cluster, as well as 
> the conservative setting for scrubbing (osd_max_scrubs = 1), one round 
> of scrubbing would not finish in 1 one week, so that we properly should 
> schedule that monthly (with weekly shallow scrubbing).

This is simply a function of the amount of data and speed of the backend, 
right?  Nothing we can fix in Ceph?

> Another problem, is that currently the scheduling of scrub is optimized 
> locally at each OSD, that is, for each PG this OSD acts as the primary, 
> it selects the one which hasn't been scheduled to scrubbing longest, put 
> it as the candidate and request scrub reserver from all replicas. Since 
> each OSD can only have 1 active scrubbing, that active slot could 
> potentially always occupied by a replica, as a result, the PG whose 
> primary is this OSD, fail to schedule and left behind.
>
> Is this issue worth an enhancement?

Good point.  Yeah, I think it's definitely worth fixing!

sage

> 
> Thanks,
> Guang
> 
> 
> ----------------------------------------
> > Date: Sun, 8 Feb 2015 13:38:28 -0800
> > From: sweil@xxxxxxxxxx
> > To: ceph-devel@xxxxxxxxxxxxxxx
> > CC: simon.leinen@xxxxxxxxx
> > Subject: scrub scheduling
> >
> > Simon Leinen at Switch did a greaet post recently about the impact of
> > scrub on their cluster(s):
> >
> > http://blog.simon.leinen.ch/2015/02/ceph-deep-scrubbing-impact.html
> >
> > Basically the 2 week deep scrub interval kicks in on exactly a 2 week
> > cycle and the cluster goes crazy for a few hours and then does nothing
> > (but client IO) for the next two weeks.
> >
> > The options governing this are:
> >
> > OPTION(osd_scrub_min_interval, OPT_FLOAT, 60*60*24) // if load is low
> > OPTION(osd_scrub_max_interval, OPT_FLOAT, 7*60*60*24) // regardless of load
> > OPTION(osd_deep_scrub_interval, OPT_FLOAT, 60*60*24*7) // once a week
> > OPTION(osd_scrub_load_threshold, OPT_FLOAT, 0.5)
> >
> > That is, if the load is < .5 (probably almost never on a real cluster) it
> > will scrub every day, otherwise (regardless of load) it will scrub each PG
> > at least once a week.
> >
> > Several things we can do here:
> >
> > 1- Maybe the shallow scrub interval should be less than the deep scrub
> > interval?
> >
> > 2- There is a new feature for hammer that limits scrub to certain times of
> > day, contributed by Xinze Chi:
> >
> > OPTION(osd_scrub_begin_hour, OPT_INT, 0)
> > OPTION(osd_scrub_end_hour, OPT_INT, 24)
> >
> > That is, by default, scrubs can happen at any time. You can use this to
> > limit to certain hour sof the night, or whatever is appropriate for your
> > cluster. That only sort of helps, though; Simon's scrub frenzy will still
> > happen one day a week, all at once (or maybe spread over 2 nights).
> >
> > 3- We can spread them out during the allowed window. But how to do that?
> > We could make the scrub interval randomly +/- a value of up to 50% of the
> > total interval. Or we could somehow look at the current rate of scrubbing
> > (average time to completeion for the current pool, maybe).. any look at
> > the total number of items in the scrub queue?
> >
> > 4- Ric pointed out to me that even if we spread these out, scrubbing at
> > full speed has an impact. Even if we do all the prioritization magic we
> > can there will still be a buch of large IOs in the queue. What if we have
> > a hard throttle on the scrub rate, objects per second and/or bytes per
> > second? In the end the same number of IOs traverse the queue and
> > potentially interfere with client IO, but they would be spread out over a
> > longer period of time and be less noticeable (i.e., slow down client IOs
> > from different workloads and not all the same workload). I'm not totally
> > convinced this is an improvement over a strategy where we have only 1
> > scrub IO in flight at all times, but that isn't quite how scrub schedules
> > itself so it's hard to compare it that way, and in the end the user
> > experience and perceived impact should be lower...
> >
> > 5- Auto-adjust the above scrub rate based on the total amount of data,
> > scrub interval, and scrub hours so that we are scrubbing at the slowest
> > rate possible that meets the schedule. We'd have to be slightly clever to
> > have the right feedback in place here...
> >
> > Thoughts?
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>  		 	   		  N????y????b?????v?????{.n??????z??ay????????j???f????????????????:+v??????????zZ+??????"?!?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html