Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

Bernhard Krieger <b.krieger@xxxxxxxx> · Thu, 21 Mar 2024 14:40:21 +0100

Hi,

i have the same issues.
Deep scrub havent finished the jobs on some PGs.

Using ceph 18.2.2.
Initial installed version was 18.0.0

In the logs i see a lot of scrub/deep-scrub starts

Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.b deep-scrubstarts
Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1a deep-scrubstarts
Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1c deep-scrubstarts
Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 11.1 scrubstarts
Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 14.6 scrubstarts
Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 10.c deep-scrubstarts
Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 12.3 deep-scrubstarts
Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 6.0 scrubstarts
Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 8.5 deep-scrubstarts
Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 5.66 deep-scrubstarts
Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 5.30 deep-scrubstarts
Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.b deep-scrubstarts
Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1a deep-scrubstarts
Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1c deep-scrubstarts
Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 11.1 scrubstarts
Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 14.6 scrubstarts
Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 10.c deep-scrubstarts
Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 12.3 scrubstarts
Mar 21 14:22:13 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 6.0 scrubstarts
Mar 21 14:22:15 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 8.5 deep-scrubstarts
Mar 21 14:22:20 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 5.66 deep-scrubstarts
Mar 21 14:22:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 5.30 scrubstarts
Mar 21 14:22:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.b deep-scrubstarts
Mar 21 14:22:32 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1a deep-scrubstarts
Mar 21 14:22:33 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1c deep-scrubstarts
Mar 21 14:22:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 11.1 deep-scrubstarts
Mar 21 14:22:37 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 14.6 scrubstarts
Mar 21 14:22:38 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 10.c scrubstarts
Mar 21 14:22:39 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 12.3 scrubstarts
Mar 21 14:22:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 6.0 deep-scrubstarts
Mar 21 14:22:43 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 8.5 deep-scrubstarts
Mar 21 14:22:46 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 5.66 deep-scrubstarts
Mar 21 14:22:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 5.30 scrubstarts
Mar 21 14:22:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.b deep-scrubstarts
Mar 21 14:22:57 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1a deep-scrubstarts
Mar 21 14:22:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 13.1c deep-scrubstarts
Mar 21 14:23:03 ceph-node10 ceph-osd[3804193]: log_channel(cluster) log 
[DBG] : 11.1 deep-scrubstarts

*
*The amount of scrubbed/deep-scrubbed pgs changes every few seconds.

[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     214 active+clean
            50 active+clean+scrubbing+deep
            25 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     208 active+clean
            53 active+clean+scrubbing+deep
            28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     208 active+clean
            53 active+clean+scrubbing+deep
            28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     207 active+clean
            54 active+clean+scrubbing+deep
            28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     202 active+clean
            56 active+clean+scrubbing+deep
            31 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs:     213 active+clean
            45 active+clean+scrubbing+deep
            31 active+clean+scrubbing

ceph pg dump showing PGs which are not deep scrubbed since january.
Some PGs deep scrubbing  over 700000 seconds.

*[ceph: root@ceph-node10 /]#  ceph pg dump pgs | grep -e 'scrubbing f'
5.6e      221223                   0         0          0        0 
 927795290112            0           0  4073      3000      4073 
 active+clean+scrubbing+deep  2024-03-20T01:07:21.196293+
0000  128383'15766927  128383:20517419   [2,4,18,16,14,21]           2 
  [2,4,18,16,14,21]               2  125519'12328877 
 2024-01-23T11:25:35.503811+0000  124844'11873951  2024-01-21T22:
24:12.620693+0000              0                    5  deep scrubbing 
for 270790s                                             53772 
               0
5.6c      221317                   0         0          0        0 
 928173256704            0           0  6332         0      6332 
 active+clean+scrubbing+deep  2024-03-18T09:29:29.233084+
0000  128382'15788196  128383:20727318     [6,9,12,14,1,4]           6 
    [6,9,12,14,1,4]               6  127180'14709746 
 2024-03-06T12:47:57.741921+0000  124817'11821502  2024-01-20T20:
59:40.566384+0000              0                13452  deep scrubbing 
for 273519s                                            122803 
               0
5.6a      221325                   0         0          0        0 
 928184565760            0           0  4649      3000      4649 
 active+clean+scrubbing+deep  2024-03-13T03:48:54.065125+
0000  128382'16031499  128383:21221685     [13,11,1,2,9,8]          13 
    [13,11,1,2,9,8]              13  127181'14915404 
 2024-03-06T13:16:58.635982+0000  125967'12517899  2024-01-28T09:
13:08.276930+0000              0                10078  deep scrubbing 
for 726001s                                            184819 
               0
5.54      221050                   0         0          0        0 
 927036203008            0           0  4864      3000      4864 
 active+clean+scrubbing+deep  2024-03-18T00:17:48.086231+
0000  128383'15584012  128383:20293678  [0,20,18,19,11,12]           0 
 [0,20,18,19,11,12]               0  127195'14651908 
 2024-03-07T09:22:31.078448+0000  124816'11813857  2024-01-20T16:
43:15.755200+0000              0                 9808  deep scrubbing 
for 306667s                                            142126 
               0
5.47      220849                   0         0          0        0 
 926233448448            0           0  5592         0      5592 
 active+clean+scrubbing+deep  2024-03-12T08:10:39.413186+
0000  128382'15653864  128383:20403071  [16,15,20,0,13,21]          16 
 [16,15,20,0,13,21]              16  127183'14600433 
 2024-03-06T18:21:03.057165+0000  124809'11792397  2024-01-20T05:
27:07.617799+0000              0                13066  deep scrubbing 
for 796697s                                            209193 
               0
dumped pgs

*

regards
Bernhard

On 20/03/2024 21:12, Bandelow, Gunnar wrote:
Hi,

i just wanted to mention, that i am running a cluster with reef 18.2.1 
with the same issue.

4 PGs start to deepscrub but dont finish since mid february. In the pg 
dump they are shown as scheduled for deep scrub. They sometimes change 
their status from active+clean to active+clean+scrubbing+deep and back.

Best regards,
Gunnar

=======================================================

Gunnar Bandelow
Universitätsrechenzentrum (URZ)
Universität Greifswald
Felix-Hausdorff-Straße 18
17489 Greifswald
Germany

Tel.: +49 3834 420 1450

--- Original Nachricht ---
*Betreff: * Re: Reef (18.2): Some PG not scrubbed/deep 
scrubbed for 1 month
*Von: *"Michel Jouvin" <michel.jouvin@xxxxxxxxxxxxxxx 
<mailto:michel.jouvin@xxxxxxxxxxxxxxx>>
*An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
*Datum: *20-03-2024 20:00

    Hi Rafael,

    Good to know I am not alone!

    Additional information ~6h after the OSD restart: over the 20 PGs
    impacted, 2 have been processed successfully... I don't have a clear
    picture on how Ceph prioritize the scrub of one PG over another, I
    had
    thought that the oldest/expired scrubs are taken first but it may
    not be
    the case. Anyway, I have seen a very significant decrese of the scrub
    activity this afternoon and the cluster is not loaded at all
    (almost no
    users yet)...

    Michel

    Le 20/03/2024 à 17:55, quaglio@xxxxxxxxxx
    <mailto:quaglio@xxxxxxxxxx> a écrit :
    > Hi,
    >      I upgraded a cluster 2 weeks ago here. The situation is the
    same
    > as Michel.
    >      A lot of PGs no scrubbed/deep-scrubed.
    >
    > Rafael.
    >
    > _______________________________________________
    > ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    > To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx