Do you have osd_scrub_begin_hour / osd_scrub_end_hour set? Constraining times when scrubs can run can result in them piling up. Are you saying that an individual PG may take 20+ elapsed days to perform a deep scrub? > Might be the result of osd_scrub_chunk_max now being 15 instead of 25 previously. See [1] and [2]. > > > [1] https://tracker.ceph.com/issues/68057 > [2] https://github.com/ceph/ceph/pull/59791/commits/0841603023ba53923a986f2fb96ab7105630c9d3 > > ----- Le 26 Nov 24, à 23:36, Laimis Juzeliūnas laimis.juzeliunas@xxxxxxxxxx a écrit : > >> Hello Ceph community, >> >> Wanted to highlight one observation and gather any Squid users having similar >> experiences. >> Since upgrading to 19.2.0 (from 18.4.0) we have observed that pg deep scrubbing >> times have drastically increased. Some pgs take 2-5 days to complete deep >> scrubbing while others increase to 20+ days. This causes the deep scrubbing >> queue to fill up and the cluster almost constantly has 'pgs not deep-scrubbed >> in time' alerts. >> We have on average 67 pgs/osd: running on 15TB hdd disks this results in >> 200GB-ish pgs. While fairly large - these pgs did not cause such increase in >> deep scrubs when on Reef. >> >> "ceph pg dump | grep 'deep scrubbing for'" will always have a few entries of >> quite morbid scrubs like the following: >> 7.3e 121289 0 0 0 0 225333247207 >> 0 0 127 0 127 active+clean+scrubbing+deep >> 2024-11-13T09:37:42.549418+0000 490179'5220664 490179:23902923 >> [268,27,122] 268 [268,27,122] 268 483850'5203141 >> 2024-11-02T11:33:57.835277+0000 472713'5197481 >> 2024-10-11T04:30:00.639763+0000 0 21873 deep >> scrubbing for 1169147s >> 34.247 62618 0 0 0 0 179797964677 >> 0 0 101 50 101 active+clean+scrubbing+deep >> 2024-11-05T06:27:52.288785+0000 490179'22729571 490179:80672442 >> [34,97,25] 34 [34,97,25] 34 481331'22436869 >> 2024-10-23T16:06:50.092439+0000 471395'22289914 >> 2024-10-07T19:29:26.115047+0000 0 204864 deep >> scrubbing for 1871733s >> >> Not pointing any fingers but Squid release had "better scrub scheduling" >> announced. >> Though this is not scheduling directly, but maybe this change had any impact >> causing such behaviour? >> >> Scrubbing configurations: >> ceph config get osd | grep scrub >> global advanced osd_deep_scrub_interval >> 2678400.000000 >> global advanced osd_deep_scrub_large_omap_object_key_threshold 500000 >> global advanced osd_max_scrubs 5 >> global advanced osd_scrub_auto_repair true >> global advanced osd_scrub_max_interval >> 2678400.000000 >> global advanced osd_scrub_min_interval >> 172800.000000 >> >> >> Cluster details (backfilling expected and caused by some manual reweights): >> cluster: >> id: 96df99f6-fc1a-11ea-90a4-6cb3113cb732 >> health: HEALTH_WARN >> 24 pgs not deep-scrubbed in time >> >> services: >> mon: 5 daemons, quorum >> ceph-node004,ceph-node003,ceph-node001,ceph-node005,ceph-node002 (age 4d) >> mgr: ceph-node001.hgythj(active, since 11d), standbys: >> ceph-node002.jphtvg >> mds: 20/20 daemons up, 12 standby >> osd: 384 osds: 384 up (since 25h), 384 in (since 5d); 5 remapped pgs >> rbd-mirror: 2 daemons active (2 hosts) >> rgw: 64 daemons active (32 hosts, 1 zones) >> >> data: >> volumes: 1/1 healthy >> pools: 14 pools, 8681 pgs >> objects: 758.42M objects, 1.5 PiB >> usage: 4.6 PiB used, 1.1 PiB / 5.7 PiB avail >> pgs: 275177/2275254543 objects misplaced (0.012%) >> 6807 active+clean >> 989 active+clean+scrubbing+deep >> 880 active+clean+scrubbing >> 5 active+remapped+backfilling >> >> io: >> client: 37 MiB/s rd, 59 MiB/s wr, 1.72k op/s rd, 439 op/s wr >> recovery: 70 MiB/s, 38 objects/s >> >> >> One thread of other users experiencing same 19.2.0 prolonged deep scrub issues: >> https://www.reddit.com/r/ceph/comments/1guynak/strange_issue_where_scrubdeep_scrub_never_finishes/ >> Any hints or help would be greately appreciated! >> >> >> Thanks in advance, >> Laimis J. >> laimis.juzeliunas@xxxxxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx