Re: scrub randomization and load threshold

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 16, 2015 at 4:58 PM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> On Mon, Nov 16, 2015 at 4:32 PM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>> On Mon, Nov 16, 2015 at 4:20 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>>> On Mon, 16 Nov 2015, Dan van der Ster wrote:
>>>> Instead of keeping a 24hr loadavg, how about we allow scrubs whenever
>>>> the loadavg is decreasing (or below the threshold)? As long as the
>>>> 1min loadavg is less than the 15min loadavg, we should be ok to allow
>>>> new scrubs. If you agree I'll add the patch below to my PR.
>>>
>>> I like the simplicity of that, I'm afraid its going to just trigger a
>>> feedback loop and oscillations on the host.  I.e., as soo as we see *any*
>>> decrease, all osds on the host will start to scrub, which will push the
>>> load up.  Once that round of PGs finish, the load will start to drop
>>> again, triggering another round.  This'll happen regardless of whether
>>> we're in the peak hours or not, and the high-level goal (IMO at least) is
>>> to do scrubbing in non-peak hours.
>>
>> We checked our OSDs' 24hr loadavg plots today and found that the
>> original idea of 0.8 * 24hr loadavg wouldn't leave many chances for
>> scrubs to run. So maybe if we used 0.9 or 1.0 it would be doable.
>>
>> BTW, I realized there was a silly error in that earlier patch, and we
>> anyway need an upper bound, say # cpus. So until your response came I
>> was working with this idea:
>> https://stikked.web.cern.ch/stikked/view/raw/5586a912
>
> Sorry for SSO. Here:
>
> https://gist.github.com/dvanders/f3b08373af0f5957f589

Hi again. Here's a first shot at a daily loadavg heuristic:
https://github.com/ceph/ceph/commit/15474124a183c7e92f457f836f7008a2813aa672
I had to guess where it would be best to store the daily_loadavg
member and where to initialize it... please advise.

I took the conservative approach of triggering scrubs when either:
   1m loadavg < osd_scrub_load_threshold, or
   1m loadavg < 24hr loadavg && 1m loadavg < 15m loadavg

The whole PR would become this:
https://github.com/ceph/ceph/compare/master...cernceph:wip-deepscrub-daily

-- Dan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux