Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

Michel Jouvin <michel.jouvin@xxxxxxxxxxxxxxx> · Thu, 21 Mar 2024 23:40:56 +0100

Hi,

Today we decided to upgrade from 18.2.0 to 18.2.2. No real hope of a 
direct impact (nothing in the change log related to something similar) 
but at least all daemons were restarted so we thought that may be this 
will clear the problem at least temporarily. Unfortunately it has not 
been the case. The same pages are still stuck, despite continuous 
activity of scrubbing/deep scrubbing in the cluster...

I'm happy to provide more information if somebody tells me what to look 
at...

Cheers,

Michel

Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
Hi,

i have the same issues.
Deep scrub havent finished the jobs on some PGs.

Using ceph 18.2.2.
Initial installed version was 18.0.0

In the logs i see a lot of scrub/deep-scrub starts

Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 11.1 scrubstarts
Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 14.6 scrubstarts
Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 10.c deep-scrubstarts
Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 12.3 deep-scrubstarts
Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 6.0 scrubstarts
Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 8.5 deep-scrubstarts
Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 5.66 deep-scrubstarts
Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 5.30 deep-scrubstarts
Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 11.1 scrubstarts
Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 14.6 scrubstarts
Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 10.c deep-scrubstarts
Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 12.3 scrubstarts
Mar 21 14:22:13 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 6.0 scrubstarts
Mar 21 14:22:15 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 8.5 deep-scrubstarts
Mar 21 14:22:20 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 5.66 deep-scrubstarts
Mar 21 14:22:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 5.30 scrubstarts
Mar 21 14:22:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:22:32 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:22:33 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:22:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 11.1 deep-scrubstarts
Mar 21 14:22:37 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 14.6 scrubstarts
Mar 21 14:22:38 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 10.c scrubstarts
Mar 21 14:22:39 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 12.3 scrubstarts
Mar 21 14:22:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 6.0 deep-scrubstarts
Mar 21 14:22:43 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 8.5 deep-scrubstarts
Mar 21 14:22:46 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 5.66 deep-scrubstarts
Mar 21 14:22:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 5.30 scrubstarts
Mar 21 14:22:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.b deep-scrubstarts
Mar 21 14:22:57 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1a deep-scrubstarts
Mar 21 14:22:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 13.1c deep-scrubstarts
Mar 21 14:23:03 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
log [DBG] : 11.1 deep-scrubstarts

*
*The amount of scrubbed/deep-scrubbed pgs changes every few seconds.

[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     214 active+clean
            50 active+clean+scrubbing+deep
            25 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     208 active+clean
            53 active+clean+scrubbing+deep
            28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     208 active+clean
            53 active+clean+scrubbing+deep
            28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     207 active+clean
            54 active+clean+scrubbing+deep
            28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     202 active+clean
            56 active+clean+scrubbing+deep
            31 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     213 active+clean
            45 active+clean+scrubbing+deep
            31 active+clean+scrubbing

ceph pg dump showing PGs which are not deep scrubbed since january.
Some PGs deep scrubbing  over 700000 seconds.

*[ceph: root@ceph-node10 /]#  ceph pg dump pgs | grep -e 'scrubbing f'
5.6e      221223                   0         0          0        0 
 927795290112            0           0  4073      3000      4073 
 active+clean+scrubbing+deep  2024-03-20T01:07:21.196293+
0000  128383'15766927  128383:20517419   [2,4,18,16,14,21]           2 
  [2,4,18,16,14,21]               2  125519'12328877 
 2024-01-23T11:25:35.503811+0000  124844'11873951  2024-01-21T22:
24:12.620693+0000              0                    5  deep scrubbing 
for 270790s                                             53772 
               0
5.6c      221317                   0         0          0        0 
 928173256704            0           0  6332         0      6332 
 active+clean+scrubbing+deep  2024-03-18T09:29:29.233084+
0000  128382'15788196  128383:20727318     [6,9,12,14,1,4]           6 
    [6,9,12,14,1,4]               6  127180'14709746 
 2024-03-06T12:47:57.741921+0000  124817'11821502  2024-01-20T20:
59:40.566384+0000              0                13452  deep scrubbing 
for 273519s                                            122803 
               0
5.6a      221325                   0         0          0        0 
 928184565760            0           0  4649      3000      4649 
 active+clean+scrubbing+deep  2024-03-13T03:48:54.065125+
0000  128382'16031499  128383:21221685     [13,11,1,2,9,8]          13 
    [13,11,1,2,9,8]              13  127181'14915404 
 2024-03-06T13:16:58.635982+0000  125967'12517899  2024-01-28T09:
13:08.276930+0000              0                10078  deep scrubbing 
for 726001s                                            184819 
               0
5.54      221050                   0         0          0        0 
 927036203008            0           0  4864      3000      4864 
 active+clean+scrubbing+deep  2024-03-18T00:17:48.086231+
0000  128383'15584012  128383:20293678  [0,20,18,19,11,12]           0 
 [0,20,18,19,11,12]               0  127195'14651908 
 2024-03-07T09:22:31.078448+0000  124816'11813857  2024-01-20T16:
43:15.755200+0000              0                 9808  deep scrubbing 
for 306667s                                            142126 
               0
5.47      220849                   0         0          0        0 
 926233448448            0           0  5592         0      5592 
 active+clean+scrubbing+deep  2024-03-12T08:10:39.413186+
0000  128382'15653864  128383:20403071  [16,15,20,0,13,21]          16 
 [16,15,20,0,13,21]              16  127183'14600433 
 2024-03-06T18:21:03.057165+0000  124809'11792397  2024-01-20T05:
27:07.617799+0000              0                13066  deep scrubbing 
for 796697s                                            209193 
               0
dumped pgs

*

regards
Bernhard

On 20/03/2024 21:12, Bandelow, Gunnar wrote:
Hi,

i just wanted to mention, that i am running a cluster with reef 
18.2.1 with the same issue.

4 PGs start to deepscrub but dont finish since mid february. In the 
pg dump they are shown as scheduled for deep scrub. They sometimes 
change their status from active+clean to active+clean+scrubbing+deep 
and back.

Best regards,
Gunnar

=======================================================

Gunnar Bandelow
Universitätsrechenzentrum (URZ)
Universität Greifswald
Felix-Hausdorff-Straße 18
17489 Greifswald
Germany

Tel.: +49 3834 420 1450

--- Original Nachricht ---
*Betreff: * Re: Reef (18.2): Some PG not scrubbed/deep 
scrubbed for 1 month
*Von: *"Michel Jouvin" <michel.jouvin@xxxxxxxxxxxxxxx 
<mailto:michel.jouvin@xxxxxxxxxxxxxxx>>
*An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
*Datum: *20-03-2024 20:00

    Hi Rafael,

    Good to know I am not alone!

    Additional information ~6h after the OSD restart: over the 20 PGs
    impacted, 2 have been processed successfully... I don't have a clear
    picture on how Ceph prioritize the scrub of one PG over another, I
    had
    thought that the oldest/expired scrubs are taken first but it may
    not be
    the case. Anyway, I have seen a very significant decrese of the 
scrub
    activity this afternoon and the cluster is not loaded at all
    (almost no
    users yet)...

    Michel

    Le 20/03/2024 à 17:55, quaglio@xxxxxxxxxx
    <mailto:quaglio@xxxxxxxxxx> a écrit :
    > Hi,
    >      I upgraded a cluster 2 weeks ago here. The situation is the
    same
    > as Michel.
    >      A lot of PGs no scrubbed/deep-scrubed.
    >
    > Rafael.
    >
    > _______________________________________________
    > ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    > To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx