deep-scrubs not respecting scrub interval (ceph luminous)

Mehmet <ceph@xxxxxxxxxx> · Fri, 22 Oct 2021 13:31:49 +0200

Hello,

i have a strange issue and hope you can enlighten me why this happens 
and how i can prevent this.

In ceph.conf i have:
... .. .
[osd]
osd_deep_scrub_interval = 2419200.000000
osd_scrub_max_interval = 2419200.000000

osd_scrub_begin_hour = 10 <= this works, great
osd_scrub_end_hour = 17 <= this works, great

But as you can see below, this seems to be not respected from ceph:

# zgrep -i "deep-scrub ok" ceph-osd.285* (logfile)
2021-10-21 14:31:12.180689 7facaf28b700  0 log_channel(cluster) log 
[DBG] : *1.752* deep-scrub ok
2021-10-20 13:17:31.502227 7facb0a8e700  0 log_channel(cluster) log 
[DBG] : 1.51 deep-scrub ok
2021-10-17 13:45:46.243041 7facafa8c700  0 log_channel(cluster) log 
[DBG] : 1.4c2 deep-scrub ok
2021-10-17 17:25:55.570801 7facb028d700  0 log_channel(cluster) log 
[DBG] : 1.81d deep-scrub ok
2021-10-16 11:36:58.695621 7facaf28b700  0 log_channel(cluster) log 
[DBG] : *1.752* deep-scrub ok
2021-10-16 16:11:50.399225 7facb0a8e700  0 log_channel(cluster) log 
[DBG] : 1.51 deep-scrub ok

i.e. *deep-scrub" on PG "1.752" (also same issue on i.e. "1.51") is done
- 2021-10-21 14:31:12
- 2021-10-16 11:36:58
there are only 5 days inbetween, if i understand this correct ceph 
should wait approx. 4 Weeks (2419200 Seconds) before another deepscrub 
of one PG has to be happen.

The cluster is in an "Health_OK" state (sometimes in warm because of 
slow requests) and i have checked that the config is in effekt on the 
said OSD in this example (osd.285):

# ceph daemon osd.285 config show | grep "interval" | grep scrub
    "mon_scrub_interval": "86400",
    "osd_deep_scrub_interval": "2419200.000000",
    "osd_scrub_interval_randomize_ratio": "0.500000",
    "osd_scrub_max_interval": "2419200.000000",
    "osd_scrub_min_interval": "86400.000000",

Does anyone know why this happens?

Hope you guys can help me to understand this.
- Mehmet

Further information - ALL *HDD* - OSDs distributed over 17 Nodes
# ceph -s
  cluster:
    id:     5d5095e2-e2c7-4790-a14c-86412d98d2dc
    health: HEALTH_WARN
            435 slow requests are blocked > 32 sec. Implicated osds 84

  services:
    mon: 3 daemons, quorum cmon01,cmon02,cmon03
    mgr: cmon01(active), standbys: cmon03, cmon02
    osd: 312 osds: 312 up, 312 in

  data:
    pools:   2 pools, 3936 pgs
    objects: 112M objects, 450 TB
    usage:   1352 TB used, 1201 TB / 2553 TB avail
    pgs:     3897 active+clean
             38   active+clean+scrubbing+deep
             1    active+clean+scrubbing

  io:
    client:   120 MB/s rd, 124 MB/s wr, 820 op/s rd, 253 op/s wr
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx