Fwd: Re: Squid: deep scrub issues

Michel Jouvin <michel.jouvin@xxxxxxxxxxxxxxx> · Wed, 27 Nov 2024 17:39:58 +0100

Pour info, meme si pas en rapport avec nos pbs je pense puisqu'on tourne 
v18...

Michel

-------- Message transféré --------
Sujet : 	 Re: Squid: deep scrub issues
Date : 	Wed, 27 Nov 2024 17:15:32 +0100 (CET)
De : 	Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>
Pour : 	Laimis Juzeliūnas <laimis.juzeliunas@xxxxxxxxxx>
Copie à : 	ceph-users <ceph-users@xxxxxxx>

Hi Laimis,

Might be the result of osd_scrub_chunk_max now being 15 instead of 25 
previously. See [1] and [2].

Cheers,
Frédéric.

[1] https://tracker.ceph.com/issues/68057
[2] 
https://github.com/ceph/ceph/pull/59791/commits/0841603023ba53923a986f2fb96ab7105630c9d3

----- Le 26 Nov 24, à 23:36, Laimis Juzeliūnas 
laimis.juzeliunas@xxxxxxxxxx a écrit :

Hello Ceph community,

Wanted to highlight one observation and gather any Squid users having 
similar
experiences.
Since upgrading to 19.2.0 (from 18.4.0) we have observed that pg deep 
scrubbing
times have drastically increased. Some pgs take 2-5 days to complete deep
scrubbing while others increase to 20+ days. This causes the deep 
scrubbing
queue to fill up and the cluster almost constantly has 'pgs not 
deep-scrubbed
in time' alerts.
We have on average 67 pgs/osd: running on 15TB hdd disks this results in
200GB-ish pgs. While fairly large - these pgs did not cause such 
increase in
deep scrubs when on Reef.

"ceph pg dump | grep 'deep scrubbing for'" will always have a few 
entries of
quite morbid scrubs like the following:
7.3e 121289 0 0 0 0 225333247207
0 0 127 0 127 active+clean+scrubbing+deep
2024-11-13T09:37:42.549418+0000 490179'5220664 490179:23902923
[268,27,122] 268 [268,27,122] 268 483850'5203141
2024-11-02T11:33:57.835277+0000 472713'5197481
2024-10-11T04:30:00.639763+0000 0 21873 deep
scrubbing for 1169147s
34.247 62618 0 0 0 0 179797964677
0 0 101 50 101 active+clean+scrubbing+deep
2024-11-05T06:27:52.288785+0000 490179'22729571 490179:80672442
[34,97,25] 34 [34,97,25] 34 481331'22436869
2024-10-23T16:06:50.092439+0000 471395'22289914
2024-10-07T19:29:26.115047+0000 0 204864 deep
scrubbing for 1871733s

Not pointing any fingers but Squid release had "better scrub scheduling"
announced.
Though this is not scheduling directly, but maybe this change had any 
impact
causing such behaviour?

Scrubbing configurations:
ceph config get osd | grep scrub
global advanced osd_deep_scrub_interval
2678400.000000
global advanced osd_deep_scrub_large_omap_object_key_threshold 500000
global advanced osd_max_scrubs 5
global advanced osd_scrub_auto_repair true
global advanced osd_scrub_max_interval
2678400.000000
global advanced osd_scrub_min_interval
172800.000000

Cluster details (backfilling expected and caused by some manual 
reweights):
cluster:
id: 96df99f6-fc1a-11ea-90a4-6cb3113cb732
health: HEALTH_WARN
24 pgs not deep-scrubbed in time

services:
mon: 5 daemons, quorum
ceph-node004,ceph-node003,ceph-node001,ceph-node005,ceph-node002 (age 4d)
mgr: ceph-node001.hgythj(active, since 11d), standbys:
ceph-node002.jphtvg
mds: 20/20 daemons up, 12 standby
osd: 384 osds: 384 up (since 25h), 384 in (since 5d); 5 remapped pgs
rbd-mirror: 2 daemons active (2 hosts)
rgw: 64 daemons active (32 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 14 pools, 8681 pgs
objects: 758.42M objects, 1.5 PiB
usage: 4.6 PiB used, 1.1 PiB / 5.7 PiB avail
pgs: 275177/2275254543 objects misplaced (0.012%)
6807 active+clean
989 active+clean+scrubbing+deep
880 active+clean+scrubbing
5 active+remapped+backfilling

io:
client: 37 MiB/s rd, 59 MiB/s wr, 1.72k op/s rd, 439 op/s wr
recovery: 70 MiB/s, 38 objects/s

One thread of other users experiencing same 19.2.0 prolonged deep 
scrub issues:
https://www.reddit.com/r/ceph/comments/1guynak/strange_issue_where_scrubdeep_scrub_never_finishes/
Any hints or help would be greately appreciated!

Thanks in advance,
Laimis J.
laimis.juzeliunas@xxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx