Re: Help with deep scrub warnings (probably a bug ... set on pool for effect)

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Tue, 5 Mar 2024 08:54:14 -0500

I had the same problem as you....

The only solution that worked for me is to set it on the pools:
    for pool in $(ceph osd pool ls); do
        ceph osd pool set "$pool" scrub_max_interval "$smaxi"
        ceph osd pool set "$pool" scrub_min_interval "$smini"
        ceph osd pool set "$pool" deep_scrub_interval "$dsi"
    done

which is insane...so therefore has to be a bug, but didn't report it because such a thing should be so obvious that it has to be just me, right?
btw this happened to be once the clock rolled around to 2024, without any other scrub related changes ... so maybe that's somehow the cause. I looked in the source code to try to solve this, and couldn't find any bug like that though. (and that's how I found that such a thing as those setting per pool even exists)

And also I find the randomize ratio stuff doesn't work that well...and I also made them smaller. It would constantly scrub things way too early, blocking the ones that are late or soon late (after recovery for example, then some are late soon). Note that the 2 ratios are different...

osd_scrub_interval_randomize_ratio multiplies time to spread out the scheduling, and that is ok but not that great... I think what makes sense is do the *next* one early once idle, not a *random* one early at *random* timing, but it's good enough to eventually spread things out. Scrubs are fast and low impact so I don't think you have to worry about this at all. You may be "just bunching them up" ... so you want to verify that (pg dump, sort by timestamp), but I found it doesn't matter in practice... as long as it's lower than osd_max_scrubs at any given time, I don't really care.

But I think the other is terrible, especially if you have your deep scrubs a lot less often than scrubs. osd_deep_scrub_randomize_ratio will randomly upgrade scrubs into deep scrubs which are way more IO intensive. Again I disagree with that making sense to randomly do it on a random pg at a mostly random time instead of eg. doing the next deep scrub a bit early when load is low... but in this case it matters because of how much more IO intensive it is. Any time I see late scrubs (frequent after recovery), I also see it's scrubbing things like that way too early while it's blocking late ones, so it takes very long to finally get all scrubs done. I changed that to 0.000001 so it doesn't bother me now.

Peter

On 2024-03-05 07:58, Anthony D'Atri wrote:
* Try applying the settings to global so that mons/mgrs get them.

* Set your shallow scrub settings back to the default.  Shallow scrubs take very few resources

* Set your randomize_ratio back to the default, you’re just bunching them up

* Set the load threshold back to the default, I can’t imagine any OSD node ever having a load < 0.3, you’re basically keeping scrubs from ever running

* osd_deep_scrub_interval is the only thing you should need to change.

On Mar 5, 2024, at 2:42 AM, Nicola Mori <mori@xxxxxxxxxx> wrote:

Dear Ceph users,

in order to reduce the deep scrub load on my cluster I set the deep scrub interval to 2 weeks, and tuned other parameters as follows:

# ceph config get osd osd_deep_scrub_interval
1209600.000000
# ceph config get osd osd_scrub_sleep
0.100000
# ceph config get osd osd_scrub_load_threshold
0.300000
# ceph config get osd osd_deep_scrub_randomize_ratio
0.100000
# ceph config get osd osd_scrub_min_interval
259200.000000
# ceph config get osd osd_scrub_max_interval
1209600.000000

In my admittedly poor knowledge of Ceph's deep scrub procedures, these settings should spread the deep scrub operations in two weeks instead of the default one week, lowering the scrub frequency and the related load. But I'm currently getting warnings like:

[WRN] PG_NOT_DEEP_SCRUBBED: 56 pgs not deep-scrubbed in time
    pg 3.1e1 not deep-scrubbed since 2024-02-22T00:22:55.296213+0000
    pg 3.1d9 not deep-scrubbed since 2024-02-20T03:41:25.461002+0000
    pg 3.1d5 not deep-scrubbed since 2024-02-20T09:52:57.334058+0000
    pg 3.1cb not deep-scrubbed since 2024-02-20T03:30:40.510979+0000
    . . .

I don't understand the first one, since the deep scrub interval should be two weeks so I don''t expect warnings for PGs which have been deep-scrubbed less than 14 days ago (at the moment I'm writing it's Tue Mar  5 07:39:07 UTC 2024).

Moreover, I don't understand why the deep scrub for so many PGs is lagging behind. Is there something wrong in my settings?

Thanks in advance for any help,

Nicola
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
--------------------------------------------
Peter Maloney
Brockmann Consult GmbH
www.brockmann-consult.de
Chrysanderstr. 1
D-21029 Hamburg, Germany
Tel: +49 (0)40 69 63 89 - 320
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Amtsgericht Hamburg HRB 157689
Geschäftsführer Dr. Carsten Brockmann
--------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx