Squid: deep scrub issues

Laimis Juzeliūnas <laimis.juzeliunas@xxxxxxxxxx> · Tue, 26 Nov 2024 22:36:41 -0000

Hello Ceph community,

Wanted to highlight one observation and gather any Squid users having similar experiences.
Since upgrading to 19.2.0 (from 18.4.0) we have observed that pg deep scrubbing times have drastically increased. Some pgs take 2-5 days to complete deep scrubbing while others increase to 20+ days. This causes the deep scrubbing queue to fill up and the cluster almost constantly has 'pgs not deep-scrubbed in time' alerts.
We have on average 67 pgs/osd: running on 15TB hdd disks this results in 200GB-ish pgs. While fairly large - these pgs did not cause such increase in deep scrubs when on Reef.

"ceph pg dump | grep 'deep scrubbing for'" will always have a few entries of quite morbid scrubs like the following:
7.3e      121289                   0         0          0        0  225333247207            0           0   127         0       127  active+clean+scrubbing+deep  2024-11-13T09:37:42.549418+0000     490179'5220664    490179:23902923   [268,27,122]         268   [268,27,122]             268     483850'5203141  2024-11-02T11:33:57.835277+0000     472713'5197481  2024-10-11T04:30:00.639763+0000              0                21873  deep scrubbing for 1169147s
34.247     62618                   0         0          0        0  179797964677            0           0   101        50       101  active+clean+scrubbing+deep  2024-11-05T06:27:52.288785+0000    490179'22729571    490179:80672442     [34,97,25]          34     [34,97,25]              34    481331'22436869  2024-10-23T16:06:50.092439+0000    471395'22289914  2024-10-07T19:29:26.115047+0000              0               204864  deep scrubbing for 1871733s

Not pointing any fingers but Squid release had "better scrub scheduling" announced. 
Though this is not scheduling directly, but maybe this change had any impact causing such behaviour?

Scrubbing configurations:
ceph config get osd | grep scrub
global        advanced  osd_deep_scrub_interval                         2678400.000000
global        advanced  osd_deep_scrub_large_omap_object_key_threshold  500000
global        advanced  osd_max_scrubs                                  5
global        advanced  osd_scrub_auto_repair                           true
global        advanced  osd_scrub_max_interval                          2678400.000000
global        advanced  osd_scrub_min_interval                          172800.000000

Cluster details (backfilling expected and caused by some manual reweights):
  cluster:
    id:     96df99f6-fc1a-11ea-90a4-6cb3113cb732
    health: HEALTH_WARN
            24 pgs not deep-scrubbed in time

  services:
    mon:        5 daemons, quorum ceph-node004,ceph-node003,ceph-node001,ceph-node005,ceph-node002 (age 4d)
    mgr:        ceph-node001.hgythj(active, since 11d), standbys: ceph-node002.jphtvg
    mds:        20/20 daemons up, 12 standby
    osd:        384 osds: 384 up (since 25h), 384 in (since 5d); 5 remapped pgs
    rbd-mirror: 2 daemons active (2 hosts)
    rgw:        64 daemons active (32 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   14 pools, 8681 pgs
    objects: 758.42M objects, 1.5 PiB
    usage:   4.6 PiB used, 1.1 PiB / 5.7 PiB avail
    pgs:     275177/2275254543 objects misplaced (0.012%)
             6807 active+clean
             989  active+clean+scrubbing+deep
             880  active+clean+scrubbing
             5    active+remapped+backfilling

  io:
    client:   37 MiB/s rd, 59 MiB/s wr, 1.72k op/s rd, 439 op/s wr
    recovery: 70 MiB/s, 38 objects/s

One thread of other users experiencing same 19.2.0 prolonged deep scrub issues: https://www.reddit.com/r/ceph/comments/1guynak/strange_issue_where_scrubdeep_scrub_never_finishes/ 
Any hints or help would be greately appreciated!

Thanks in advance,
Laimis J. 
laimis.juzeliunas@xxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx