Re: pgs not deep-scrubbed in time

Eugen Block <eblock@xxxxxx> · Wed, 18 Dec 2024 14:55:18 +0000

Hi,

check out the docs [0] or my blog post [1]. Either set the new  
interval globally, or at least for the mgr as well, otherwise it will  
still check for the default interval.

Regards,
Eugen

[0]  
https://docs.ceph.com/en/latest/rados/operations/health-checks/#first-method
[1]  
http://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/

Zitat von Jan Kasprzak <kas@xxxxxxxxxx>:

Hello, Ceph users,

a question/problem related to deep scrubbing:

I have a HDD-based Ceph 18 cluster currently with 34 osds and 600-ish pgs.
In order to avoid latency peaks which apparently correlate with HDD being
100 % busy for several hours during a deep scrub, I wanted to relax the
scrubbing frequency and concurrency. Six days ago I have modified
the following config parameters:

ceph config set osd osd_scrub_max_interval 2592000    # was 604800
ceph config set osd osd_deep_scrub_interval 2592000   # was 604800
ceph config set osd osd_max_scrubs 1                  # was 3

The intervals are four times longer and max_scrubs is three times lower,
so the total scrubbing load should _decrease_. However, just several hours
after the config change, my cluster went to HEALTH_WARN with
"XX pgs not deep-scrubbed in time".

I thought this was something temporary, but six days later,
the number of pgs not scrubbed in time still grows - now it says 58.
In "ceph -s" there are about 6-8 pgs active+clean+scrubbing,
and 4-6 active+clean+scrubbing+deep all the time, so scrubbing
still happens.

According to "ceph pg dump" a deep scrub of a pg takes about 9000 seconds.
It seems all pgs are scheduled to be scrubbed in the next two days:

# for i in `seq 18 24`; do echo -n "2024-12-$i "; ceph pg dump  
2>/dev/null | grep -c "scheduled @ 2024-12-$i"; done
2024-12-18 152
2024-12-19 422
2024-12-20 7
2024-12-21 0
2024-12-22 0
2024-12-23 0
2024-12-24 0

The positive thing is that latency-wise it helped: with at most one pg being
deep-scrubbed per OSD at any time, the utilization of the HDD never grows
near 100 %, it stays at ~60 % when deep scrubbing is in progress on that OSD.

Is there any other config parameter which I should modify together
with the above three parameters?

Thanks!

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/                        GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx