Re: How to configure something like osd_deep_scrub_min_interval?

Frank Schilder <frans@xxxxxx> · Tue, 9 Jan 2024 16:05:26 +0000

Quick answers:

  *   ... osd_deep_scrub_randomize_ratio ... but not on Octopus: is it still a valid parameter?

Yes, this parameter exists and can be used to prevent premature deep-scrubs. The effect is dramatic.

  *   ... essentially by playing with osd_scrub_min_interval,...

The main parameter is actually osd_deep_scrub_randomize_ratio, all other parameters have less effect in terms of scrub load. osd_scrub_min_interval is the second most important parameter and needs increasing for large SATA-/NL-SAS-HDDs. For sufficiently fast drives the default of 24h is good (although might be a bit aggressive/paranoid).

  *   Another small question: you opt for osd_max_scrubs=1 just to make sure
your I/O is not adversely affected by scrubbing, or is there a more
profound reason for that?

Well, not affecting user-IO too much is a quite profound reason and many admins try to avoid scrubbing at all when users are on the system. It makes IO somewhat unpredictable and can trigger user complaints.

However, there is another profound reason: for HDDs it increases deep-scrub load (that is, interference with user IO) a lot while it actually slows down the deep-scrubbing. HDDs can't handle the implied random IO of concurrent deep-scrubs well. On my system I saw that with osd_max_scrubs=2 the scrub time for a PG increased a bit more than double. In other words: more scrub load, less scrub progress = useless, do not do this.

I plan to document the script a bit more and am waiting for some deep-scrub histograms to converge to equilibrium. This takes months for our large pools, but I would like to have the numbers for an example of how it should look like.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________
From: Fulvio Galeazzi
Sent: Monday, January 8, 2024 4:21 PM
To: Frank Schilder; ceph-users@xxxxxxx
Subject: Re:  Re: How to configure something like osd_deep_scrub_min_interval?

Hallo Frank,
        just found this post, thank you! I have also been puzzled/struggling
with scrub/deep-scrub and found your post very useful: will give this a
try, soon.

One thing, first: I am using Octopus, too, but I cannot find any
documentation about osd_deep_scrub_randomize_ratio. I do see that in
past releases, but not on Octopus: is it still a valid parameter?

Let me check whether I understood your procedure: you optimize scrub
time distribution essentially by playing with osd_scrub_min_interval,
thus "forcing" the automated algorithm to preferentially select
older-scrubbed PGs, am I correct?

Another small question: you opt for osd_max_scrubs=1 just to make sure
your I/O is not adversely affected by scrubbing, or is there a more
profound reason for that?

   Thanks!

                Fulvio

On 12/13/23 13:36, Frank Schilder wrote:
> Hi all,
>
> since there seems to be some interest, here some additional notes.
>
> 1) The script is tested on octopus. It seems that there was a change in the output of ceph commands used and it might need some tweaking to get it to work on other versions.
>
> 2) If you want to give my findings a shot, you can do so in a gradual way. The most important change is setting osd_deep_scrub_randomize_ratio=0 (with osd_max_scrubs=1), this will make osd_deep_scrub_interval work exactly as the requested osd_deep_scrub_min_interval setting, PGs with a deep-scrub stamp younger than osd_deep_scrub_interval will *not* be deep-scrubbed. This is the one change to test, all other settings have less impact. The script will not report some numbers at the end, but the histogram will be correct. Let it run a few deep-scrub-interval rounds until the histogram is evened out.
>
> If you start your test after using osd_max_scrubs>1 for a while -as I did - you will need a lot of patience and might need to mute some scrub warnings for a while.
>
> 3) The changes are mostly relevant for large HDDs that take a long time to deep-scrub (many small objects). The overall load reduction, however, is useful in general.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Fulvio Galeazzi
GARR-Net Department
tel.: +39-334-6533-250
skype: fgaleazzi70
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx