Re: Squid: deep scrub issues

Nmz <nemesiz@xxxxxx> · Thu, 28 Nov 2024 17:40:54 +0200

Sveikas,

Can you try to set 'ceph config set osd osd_mclock_profile high_recovery_ops' and see how will it effect you?

For some PG deep scrub runned for about 20h for me. After I gave more priority 1,2 hour was enaught to finish.

----- Original Message -----
From: Laimis Juzeliūnas <laimis.juzeliunas@xxxxxxxxxx>
To: ceph-users@xxxxxxx
Date: Wednesday, November 27, 2024, 12:36:41 AM
Subject:  Squid: deep scrub issues

> Hello Ceph community,

> Wanted to highlight one observation and gather any Squid users having similar experiences.
> Since upgrading to 19.2.0 (from 18.4.0) we have observed that pg deep scrubbing times have drastically increased. Some pgs take 2-5 days to complete deep scrubbing while others increase to 20+ days. This causes the deep scrubbing queue to fill up and the cluster almost constantly has 'pgs not deep-scrubbed in time' alerts.
> We have on average 67 pgs/osd: running on 15TB hdd disks this results in 200GB-ish pgs. While fairly large - these pgs did not cause such increase in deep scrubs when on Reef.

> "ceph pg dump | grep 'deep scrubbing for'" will always have a few entries of quite morbid scrubs like the following:
> 7.3e      121289                   0         0          0        0  225333247207            0           0   127         0       127  active+clean+scrubbing+deep  2024-11-13T09:37:42.549418+0000     490179'5220664    490179:23902923   [268,27,122]         268   [268,27,122]             268     483850'5203141  2024-11-02T11:33:57.835277+0000     472713'5197481  2024-10-11T04:30:00.639763+0000              0                21873  deep scrubbing for 1169147s
> 34.247     62618                   0         0          0        0  179797964677            0           0   101        50       101  active+clean+scrubbing+deep  2024-11-05T06:27:52.288785+0000    490179'22729571    490179:80672442     [34,97,25]          34     [34,97,25]              34    481331'22436869  2024-10-23T16:06:50.092439+0000    471395'22289914  2024-10-07T19:29:26.115047+0000              0               204864  deep scrubbing for 1871733s

> Not pointing any fingers but Squid release had "better scrub scheduling" announced. 
> Though this is not scheduling directly, but maybe this change had any impact causing such behaviour?

> Scrubbing configurations:
> ceph config get osd | grep scrub
> global        advanced  osd_deep_scrub_interval                         2678400.000000
> global        advanced  osd_deep_scrub_large_omap_object_key_threshold  500000
> global        advanced  osd_max_scrubs                                  5
> global        advanced  osd_scrub_auto_repair                           true
> global        advanced  osd_scrub_max_interval                          2678400.000000
> global        advanced  osd_scrub_min_interval                          172800.000000

> Cluster details (backfilling expected and caused by some manual reweights):
>   cluster:
>     id:     96df99f6-fc1a-11ea-90a4-6cb3113cb732
>     health: HEALTH_WARN
>             24 pgs not deep-scrubbed in time

>   services:
>     mon:        5 daemons, quorum ceph-node004,ceph-node003,ceph-node001,ceph-node005,ceph-node002 (age 4d)
>     mgr:        ceph-node001.hgythj(active, since 11d), standbys: ceph-node002.jphtvg
>     mds:        20/20 daemons up, 12 standby
>     osd:        384 osds: 384 up (since 25h), 384 in (since 5d); 5 remapped pgs
>     rbd-mirror: 2 daemons active (2 hosts)
>     rgw:        64 daemons active (32 hosts, 1 zones)

>   data:
>     volumes: 1/1 healthy
>     pools:   14 pools, 8681 pgs
>     objects: 758.42M objects, 1.5 PiB
>     usage:   4.6 PiB used, 1.1 PiB / 5.7 PiB avail
>     pgs:     275177/2275254543 objects misplaced (0.012%)
>              6807 active+clean
>              989  active+clean+scrubbing+deep
>              880  active+clean+scrubbing
>              5    active+remapped+backfilling

>   io:
>     client:   37 MiB/s rd, 59 MiB/s wr, 1.72k op/s rd, 439 op/s wr
>     recovery: 70 MiB/s, 38 objects/s

> One thread of other users experiencing same 19.2.0 prolonged deep scrub issues: https://www.reddit.com/r/ceph/comments/1guynak/strange_issue_where_scrubdeep_scrub_never_finishes/ ;
> Any hints or help would be greately appreciated!

> Thanks in advance,
> Laimis J. 
> laimis.juzeliunas@xxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx