Re: scrub randomization and load threshold

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 16 Nov 2015 16:32:11 +0100

On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Mon, 16 Nov 2015, Dan van der Ster wrote:
>> Instead of keeping a 24hr loadavg, how about we allow scrubs whenever
>> the loadavg is decreasing (or below the threshold)? As long as the
>> 1min loadavg is less than the 15min loadavg, we should be ok to allow
>> new scrubs. If you agree I'll add the patch below to my PR.
>
> I like the simplicity of that, I'm afraid its going to just trigger a
> feedback loop and oscillations on the host.  I.e., as soo as we see *any*
> decrease, all osds on the host will start to scrub, which will push the
> load up.  Once that round of PGs finish, the load will start to drop
> again, triggering another round.  This'll happen regardless of whether
> we're in the peak hours or not, and the high-level goal (IMO at least) is
> to do scrubbing in non-peak hours.

We checked our OSDs' 24hr loadavg plots today and found that the
original idea of 0.8 * 24hr loadavg wouldn't leave many chances for
scrubs to run. So maybe if we used 0.9 or 1.0 it would be doable.

BTW, I realized there was a silly error in that earlier patch, and we
anyway need an upper bound, say # cpus. So until your response came I
was working with this idea:
https://stikked.web.cern.ch/stikked/view/raw/5586a912

-- dan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html