Hi,
Today we decided to upgrade from 18.2.0 to 18.2.2. No real hope of a
direct impact (nothing in the change log related to something similar)
but at least all daemons were restarted so we thought that may be this
will clear the problem at least temporarily. Unfortunately it has not
been the case. The same pages are still stuck, despite continuous
activity of scrubbing/deep scrubbing in the cluster...
I'm happy to provide more information if somebody tells me what to look
at...
Cheers,
Michel
Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
Hi,
i have the same issues.
Deep scrub havent finished the jobs on some PGs.
Using ceph 18.2.2.
Initial installed version was 18.0.0
In the logs i see a lot of scrub/deep-scrub starts
Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 11.1 scrubstarts
Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 14.6 scrubstarts
Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 10.c deep-scrubstarts
Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 12.3 deep-scrubstarts
Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 6.0 scrubstarts
Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 8.5 deep-scrubstarts
Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 5.66 deep-scrubstarts
Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 5.30 deep-scrubstarts
Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 11.1 scrubstarts
Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 14.6 scrubstarts
Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 10.c deep-scrubstarts
Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 12.3 scrubstarts
Mar 21 14:22:13 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 6.0 scrubstarts
Mar 21 14:22:15 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 8.5 deep-scrubstarts
Mar 21 14:22:20 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 5.66 deep-scrubstarts
Mar 21 14:22:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 5.30 scrubstarts
Mar 21 14:22:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:22:32 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:22:33 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:22:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 11.1 deep-scrubstarts
Mar 21 14:22:37 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 14.6 scrubstarts
Mar 21 14:22:38 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 10.c scrubstarts
Mar 21 14:22:39 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 12.3 scrubstarts
Mar 21 14:22:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 6.0 deep-scrubstarts
Mar 21 14:22:43 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 8.5 deep-scrubstarts
Mar 21 14:22:46 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 5.66 deep-scrubstarts
Mar 21 14:22:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 5.30 scrubstarts
Mar 21 14:22:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:22:57 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:22:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:23:03 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
log [DBG] : 11.1 deep-scrubstarts
*
*The amount of scrubbed/deep-scrubbed pgs changes every few seconds.
[root@ceph-node10 ~]# ceph -s | grep active+clean
pgs: 214 active+clean
50 active+clean+scrubbing+deep
25 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
pgs: 208 active+clean
53 active+clean+scrubbing+deep
28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
pgs: 208 active+clean
53 active+clean+scrubbing+deep
28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
pgs: 207 active+clean
54 active+clean+scrubbing+deep
28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
pgs: 202 active+clean
56 active+clean+scrubbing+deep
31 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
pgs: 213 active+clean
45 active+clean+scrubbing+deep
31 active+clean+scrubbing
ceph pg dump showing PGs which are not deep scrubbed since january.
Some PGs deep scrubbing over 700000 seconds.
*[ceph: root@ceph-node10 /]# ceph pg dump pgs | grep -e 'scrubbing f'
5.6e 221223 0 0 0 0
927795290112 0 0 4073 3000 4073
active+clean+scrubbing+deep 2024-03-20T01:07:21.196293+
0000 128383'15766927 128383:20517419 [2,4,18,16,14,21] 2
[2,4,18,16,14,21] 2 125519'12328877
2024-01-23T11:25:35.503811+0000 124844'11873951 2024-01-21T22:
24:12.620693+0000 0 5 deep scrubbing
for 270790s 53772
0
5.6c 221317 0 0 0 0
928173256704 0 0 6332 0 6332
active+clean+scrubbing+deep 2024-03-18T09:29:29.233084+
0000 128382'15788196 128383:20727318 [6,9,12,14,1,4] 6
[6,9,12,14,1,4] 6 127180'14709746
2024-03-06T12:47:57.741921+0000 124817'11821502 2024-01-20T20:
59:40.566384+0000 0 13452 deep scrubbing
for 273519s 122803
0
5.6a 221325 0 0 0 0
928184565760 0 0 4649 3000 4649
active+clean+scrubbing+deep 2024-03-13T03:48:54.065125+
0000 128382'16031499 128383:21221685 [13,11,1,2,9,8] 13
[13,11,1,2,9,8] 13 127181'14915404
2024-03-06T13:16:58.635982+0000 125967'12517899 2024-01-28T09:
13:08.276930+0000 0 10078 deep scrubbing
for 726001s 184819
0
5.54 221050 0 0 0 0
927036203008 0 0 4864 3000 4864
active+clean+scrubbing+deep 2024-03-18T00:17:48.086231+
0000 128383'15584012 128383:20293678 [0,20,18,19,11,12] 0
[0,20,18,19,11,12] 0 127195'14651908
2024-03-07T09:22:31.078448+0000 124816'11813857 2024-01-20T16:
43:15.755200+0000 0 9808 deep scrubbing
for 306667s 142126
0
5.47 220849 0 0 0 0
926233448448 0 0 5592 0 5592
active+clean+scrubbing+deep 2024-03-12T08:10:39.413186+
0000 128382'15653864 128383:20403071 [16,15,20,0,13,21] 16
[16,15,20,0,13,21] 16 127183'14600433
2024-03-06T18:21:03.057165+0000 124809'11792397 2024-01-20T05:
27:07.617799+0000 0 13066 deep scrubbing
for 796697s 209193
0
dumped pgs
*
regards
Bernhard
On 20/03/2024 21:12, Bandelow, Gunnar wrote:
Hi,
i just wanted to mention, that i am running a cluster with reef
18.2.1 with the same issue.
4 PGs start to deepscrub but dont finish since mid february. In the
pg dump they are shown as scheduled for deep scrub. They sometimes
change their status from active+clean to active+clean+scrubbing+deep
and back.
Best regards,
Gunnar
=======================================================
Gunnar Bandelow
Universitätsrechenzentrum (URZ)
Universität Greifswald
Felix-Hausdorff-Straße 18
17489 Greifswald
Germany
Tel.: +49 3834 420 1450
--- Original Nachricht ---
*Betreff: * Re: Reef (18.2): Some PG not scrubbed/deep
scrubbed for 1 month
*Von: *"Michel Jouvin" <michel.jouvin@xxxxxxxxxxxxxxx
<mailto:michel.jouvin@xxxxxxxxxxxxxxx>>
*An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
*Datum: *20-03-2024 20:00
Hi Rafael,
Good to know I am not alone!
Additional information ~6h after the OSD restart: over the 20 PGs
impacted, 2 have been processed successfully... I don't have a clear
picture on how Ceph prioritize the scrub of one PG over another, I
had
thought that the oldest/expired scrubs are taken first but it may
not be
the case. Anyway, I have seen a very significant decrese of the
scrub
activity this afternoon and the cluster is not loaded at all
(almost no
users yet)...
Michel
Le 20/03/2024 à 17:55, quaglio@xxxxxxxxxx
<mailto:quaglio@xxxxxxxxxx> a écrit :
> Hi,
> I upgraded a cluster 2 weeks ago here. The situation is the
same
> as Michel.
> A lot of PGs no scrubbed/deep-scrubed.
>
> Rafael.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx