Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

The attempt to rerun the bench was not really a success. I got the following messages:

-----

Mar 22 14:48:36 idr-osd2 ceph-osd[326854]: osd.29 83873 maybe_override_max_osd_capacity_for_qos osd bench result - bandwidth (MiB/sec): 10.910 iops: 2792.876 elapsed_sec: 1.074 Mar 22 14:48:36 idr-osd2 ceph-osd[326854]: log_channel(cluster) log [WRN] : OSD bench result of 2792.876456 IOPS exceeded the threshold limit of 500.000000 IOPS for osd.29. IOPS capacity is unchanged at 0.000000 IOPS. The recommendation is to establish the osd's IOPS capacity using other benchmark tools (e.g. Fio) and then override osd_mclock_max_capacity_iops_[hdd|ssd].
-----

I decided as a first step to raise the osd_mclock_max_capacity_iops_hdd for the suspect OSD to 50. It was magic! I already managed to get 16 over 17 scrubs/deep scrubs to be run and the last one is in progress.

I now have to understand why this OSD had such bad perfs that osd_mclock_max_capacity_iops_hdd was set to such a low value... I have 12 OSDs with an entry for their osd_mclock_max_capacity_iops_hdd and they are mostly on one server (with 2 OSDs on another one). I suspect there was a problem on these servers at some points. It is unclear why it is not enough to just rerun the benchmark and why a crazy value for an HDD is found...

Best regards,

Michel

Le 22/03/2024 à 14:44, Michel Jouvin a écrit :
Hi Frédéric,

I think you raise the right point, sorry if I misunderstood Pierre's suggestion to look at OSD performances. Just before reading your email, I was implementing Pierre's suggestion for max_osd_scrubs and I saw the osd_mclock_max_capacity_iops_hdd for a few OSDs (I guess those with a value different from the default). For the suspect OSD, the value is very low, 0.145327, and I suspect it is the cause of the problem. A few others have a value ~5 which I find also very low (all OSDs are using the same recent HW/HDD).

Thanks for these informations. I'll follow your suggestions to rerun the benchmark and report if it improved the situation.

Best regards,

Michel

Le 22/03/2024 à 12:18, Frédéric Nass a écrit :
Hello Michel,

Pierre also suggested checking the performance of this OSD's device(s) which can be done by running a ceph tell osd.x bench.

One think I can think of is how the scrubbing speed of this very OSD could be influenced by mclock sheduling, would the max iops capacity calculated by this OSD during its initialization be significantly lower than other OSDs's.

What I would do is check (from this OSD's log) the calculated value for max iops capacity and compare it to other OSDs. Eventually force a recalculation by setting 'ceph config set osd.x osd_mclock_force_run_benchmark_on_init true' and restart this OSD.

Also I would:

- compare running OSD's mclock values (cephadm shell ceph daemon osd.x config show | grep mclock) to other OSDs's.
- compare ceph tell osd.x bench to other OSDs's benchmarks.
- compare the rotational status of this OSD's db and data devices to other OSDs, to make sure things are in order.

Bests,
Frédéric.

PS: If mclock is the culprit here, then setting osd_op_queue back to mpq for this only OSD would probably reveal it. Not sure about the implication of having a signel OSD running a different scheduler in the cluster though.


----- Le 22 Mar 24, à 10:11, Michel Jouvin michel.jouvin@xxxxxxxxxxxxxxx a écrit :

Pierre,

Yes, as mentioned in my initial email, I checked the OSD state and found nothing wrong either in the OSD logs or in the system logs (SMART errors).

Thanks for the advice of increasing osd_max_scrubs, I may try it, but I
doubt it is a contention problem because it really only affects a fixed
set of PGs (no new PGS have a "stucked scrub") and there is a
significant scrubbing activity going on continuously (~10K PGs in the
cluster).

Again, it is not a problem for me to try to kick out the suspect OSD and see it fixes the issue but as this cluster is pretty simple/low in terms
of activity and I see nothing that may explain why we have this
situation on a pretty new cluster (9 months, created in Quincy) and not
on our 2 other production clusters, much more used, one of them being
the backend storage of a significant OpenStack clouds, a cluster created 10 years ago with Infernetis and upgraded since then, a better candidate
for this kind of problems! So, I'm happy to contribute to
troubleshooting a potential issue in Reef if somebody finds it useful
and can help. Else I'll try the approach that worked for Gunnar.

Best regards,

Michel

Le 22/03/2024 à 09:59, Pierre Riteau a écrit :
Hello Michel,

It might be worth mentioning that the next releases of Reef and Quincy
should increase the default value of osd_max_scrubs from 1 to 3. See
the Reef pull request: https://github.com/ceph/ceph/pull/55173
You could try increasing this configuration setting if you
haven't already, but note that it can impact client I/O performance.

Also, if the delays appear to be related to a single OSD, have you
checked the health and performance of this device?

On Fri, 22 Mar 2024 at 09:29, Michel Jouvin
<michel.jouvin@xxxxxxxxxxxxxxx> wrote:

     Hi,

     As I said in my initial message, I'd in mind to do exactly the
     same as I
     identified in my initial analysis that all the PGs with this problem
     where sharing one OSD (but only 20 PGs had the problem over ~200
     hosted
     by the OSD). But as I don't feel I'm in an urgent situation, I was      wondering if collecting more information on the problem may have some
     value and which one... If it helps, I add below the `pg dump` for
     the 17
     PGs still with a "stucked scrub".

     I observed the "stucked scrubs" is lowering very slowly. In the
     last 12
     hours, 1 more PG was successfully scrubbed/deep scrubbed. In case
     it was
     not clear in my initial message, the lists of PGs with a too old
     scrub
     and too old deep scrub are the same.

     Without an answer, next week i may consider doing what you did:
     remove
     the suspect OSD (instead of just restarting it) and see it
     unblocks the
     stucked scrubs.

     Best regards,

     Michel

     --------------------------------- "ceph pg dump pgs" for the 17
     PGs with
     a too old scrub and deep scrub (same list)
------------------------------------------------------------

     PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED MISPLACED UNFOUND
     BYTES        OMAP_BYTES*  OMAP_KEYS*  LOG    LOG_DUPS DISK_LOG  STATE
     STATE_STAMP                      VERSION       REPORTED
     UP                 UP_PRIMARY  ACTING ACTING_PRIMARY
     LAST_SCRUB    SCRUB_STAMP LAST_DEEP_SCRUB
     DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN LAST_SCRUB_DURATION
     SCRUB_SCHEDULING OBJECTS_SCRUBBED  OBJECTS_TRIMMED
     29.7e3       260                   0         0 0 0
     1090519040            0           0   1978       500
     1978                 active+clean 2024-03-21T18:28:53.369789+0000
     39202'2478    83812:97136 [29,141,64,194]          29
     [29,141,64,194]              29 39202'2478
     2024-02-17T19:56:34.413412+0000       39202'2478
     2024-02-17T19:56:34.413412+0000              0 3 queued for deep
     scrub
     0                0
     25.7cc         0                   0         0 0 0
     0            0           0      0      1076 0
     active+clean 2024-03-21T18:09:48.104279+0000 46253'548
     83812:89843        [29,50,173]          29 [29,50,173]
     29     39159'536 2024-02-17T18:14:54.950401+0000 39159'536
     2024-02-17T18:14:54.950401+0000              0 1 queued for deep
     scrub
     0                0
     25.70c         0                   0         0 0 0
     0            0           0      0       918 0
     active+clean 2024-03-21T18:00:57.942902+0000 46253'514
     83812:95212 [29,195,185]          29 [29,195,185]              29
     39159'530  2024-02-18T03:56:17.559531+0000 39159'530
     2024-02-16T17:39:03.281785+0000              0 1 queued for deep
     scrub
     0                0
     29.70c       249                   0         0 0 0
     1044381696            0           0   1987       600
     1987                 active+clean 2024-03-21T18:35:36.848189+0000
     39202'2587    83812:99628 [29,138,63,12]          29
     [29,138,63,12]              29 39202'2587
     2024-02-17T21:34:22.042560+0000       39202'2587
     2024-02-17T21:34:22.042560+0000              0 1 queued for deep
     scrub
     0                0
     29.705       231                   0         0 0 0
     968884224            0           0   1959       500 1959
     active+clean 2024-03-21T18:18:22.028551+0000 39202'2459
     83812:91258 [29,147,173,61]          29 [29,147,173,61]
     29 39202'2459  2024-02-17T16:41:40.421763+0000 39202'2459
     2024-02-17T16:41:40.421763+0000              0 1 queued for deep
     scrub
     0                0
     29.6b9       236                   0         0 0 0
     989855744            0           0   1956       500 1956
     active+clean 2024-03-21T18:11:29.912132+0000 39202'2456
     83812:95607 [29,199,74,16]          29 [29,199,74,16]
     29 39202'2456  2024-02-17T11:46:06.706625+0000 39202'2456
     2024-02-17T11:46:06.706625+0000              0 1 queued for deep
     scrub
     0                0
     25.56e         0                   0         0 0 0
     0            0           0      0      1158 0
     active+clean+scrubbing+deep 2024-03-22T08:09:38.840145+0000
     46253'514   83812:637482 [111,29,128]         111
     [111,29,128]             111 39159'579
     2024-03-06T17:57:53.158936+0000        39159'579
     2024-03-06T17:57:53.158936+0000              0 1 queued for deep
     scrub
     0                0
     25.56a         0                   0         0 0 0
     0            0           0      0      1055 0
     active+clean 2024-03-21T18:00:57.940851+0000 46253'545
     83812:93475        [29,19,211]          29 [29,19,211]
     29     46253'545 2024-03-07T11:12:45.881545+0000 46253'545
     2024-03-07T11:12:45.881545+0000              0 28 queued for deep
     scrub
     0                0
     25.55a         0                   0         0 0 0
     0            0           0      0      1022 0
     active+clean 2024-03-21T18:10:24.124914+0000 46253'565
     83812:89876        [29,58,195]          29 [29,58,195]
     29     46253'561 2024-02-17T06:54:35.320454+0000 46253'561
     2024-02-17T06:54:35.320454+0000              0 28 queued for deep
     scrub
     0                0
     29.c0        256                   0         0 0 0
     1073741824            0           0   1986       600 1986
     active+clean+scrubbing+deep 2024-03-22T08:09:12.849868+0000
     39202'2586   83812:603625 [22,150,29,56]          22
     [22,150,29,56]              22 39202'2586
     2024-03-07T18:53:22.952868+0000       39202'2586
     2024-03-07T18:53:22.952868+0000              0 1 queued for deep
     scrub
     0                0
     18.6       15501                   0         0 0 0
     63959444676            0           0   2068      3000 2068
     active+clean+scrubbing+deep 2024-03-22T02:29:24.508889+0000
     81688'663900  83812:1272160 [187,29,211]         187
     [187,29,211]             187 52735'663878
     2024-03-06T16:36:32.080259+0000     52735'663878
     2024-03-06T16:36:32.080259+0000              0 684445 deep scrubbing
     for 20373s 449                0
     16.15          0                   0         0 0 0
     0            0           0      0         0 0
     active+clean 2024-03-21T18:20:29.632554+0000 0'0
     83812:104893        [29,165,85]          29 [29,165,85]
     29           0'0 2024-02-17T06:54:06.370647+0000              0'0
     2024-02-17T06:54:06.370647+0000              0 28 queued for deep
     scrub
     0                0
     25.45          0                   0         0 0 0
     0            0           0      0      1036 0
     active+clean 2024-03-21T18:10:24.125134+0000 39159'561
     83812:93649         [29,13,58]          29 [29,13,58]
     29     39159'512 2024-02-27T12:27:35.728176+0000 39159'512
     2024-02-27T12:27:35.728176+0000              0 1 queued for deep
     scrub
     0                0
     29.249       260                   0         0 0 0
     1090519040            0           0   1970       500
     1970                 active+clean 2024-03-21T18:29:22.588805+0000
     39202'2470    83812:96016 [29,191,18,143]          29
     [29,191,18,143]              29 39202'2470
     2024-02-17T13:32:42.910335+0000       39202'2470
     2024-02-17T13:32:42.910335+0000              0 1 queued for deep
     scrub
     0                0
     29.25a       248                   0         0 0 0
     1040187392            0           0   1952       600
     1952                 active+clean 2024-03-21T18:20:29.623422+0000
     39202'2552    83812:99157 [29,200,85,164]          29
     [29,200,85,164]              29 39202'2552
     2024-02-17T08:33:14.326087+0000       39202'2552
     2024-02-17T08:33:14.326087+0000              0 1 queued for deep
     scrub
     0                0
     25.3cf         0                   0         0 0 0
     0            0           0      0      1343 0
     active+clean 2024-03-21T18:16:00.933375+0000 46253'598
     83812:91659        [29,75,175]          29 [29,75,175]
     29     46253'598 2024-02-17T11:48:51.840600+0000 46253'598
     2024-02-17T11:48:51.840600+0000              0 28 queued for deep
     scrub
     0                0
     29.4ec       243                   0         0 0 0
     1019215872            0           0   1933       500
     1933                 active+clean 2024-03-21T18:15:35.389598+0000
     39202'2433   83812:101501 [29,206,63,17]          29
     [29,206,63,17]              29 39202'2433
     2024-02-17T15:10:41.027755+0000       39202'2433
     2024-02-17T15:10:41.027755+0000              0 3 queued for deep
     scrub
     0                0


     Le 22/03/2024 à 08:16, Bandelow, Gunnar a écrit :
     > Hi Michael,
     >
     > i think yesterday i found the culprit in my case.
     >
     > After inspecting "ceph pg dump" and especially the column
     > "last_scrub_duration". I found, that any PG without proper
     scrubbing
     > was located on one of three OSDs (and all these OSDs share the same      > SSD for their DB). I put them on "out" and now after backfill and
     > remapping everything seems to be fine.
     >
     > Only the log is still flooded with "scrub starts" and i have no
     clue
     > why these OSDs are causing the problems.
     > Will investigate further.
     >
     > Best regards,
     > Gunnar
     >
     > ===================================
     >
     >  Gunnar Bandelow
     >  Universitätsrechenzentrum (URZ)
     >  Universität Greifswald
     >  Felix-Hausdorff-Straße 18
     >  17489 Greifswald
     >  Germany
     >
     >  Tel.: +49 3834 420 1450
     >
     >
     > --- Original Nachricht ---
     > *Betreff: * Re: Reef (18.2): Some PG not scrubbed/deep
     > scrubbed for 1 month
     > *Von: *"Michel Jouvin" <michel.jouvin@xxxxxxxxxxxxxxx
     > <mailto:michel.jouvin@xxxxxxxxxxxxxxx>>
     > *An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
     > *Datum: *21-03-2024 23:40
     >
     >
     >
     >     Hi,
     >
     >     Today we decided to upgrade from 18.2.0 to 18.2.2. No real
     hope of a
     >     direct impact (nothing in the change log related to something
     >     similar)
     >     but at least all daemons were restarted so we thought that
     may be
     >     this
     >     will clear the problem at least temporarily. Unfortunately
     it has not
     >     been the case. The same pages are still stuck, despite
     continuous
     >     activity of scrubbing/deep scrubbing in the cluster...
     >
     >     I'm happy to provide more information if somebody tells me
     what to
     >     look
     >     at...
     >
     >     Cheers,
     >
     >     Michel
     >
     >     Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
     >     > Hi,
     >     >
     >     > i have the same issues.
     >     > Deep scrub havent finished the jobs on some PGs.
     >     >
     >     > Using ceph 18.2.2.
     >     > Initial installed version was 18.0.0
     >     >
     >     >
     >     > In the logs i see a lot of scrub/deep-scrub starts
     >     >
     >     > Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.b deep-scrubstarts
     >     > Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1a deep-scrubstarts
     >     > Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1c deep-scrubstarts
     >     > Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 11.1 scrubstarts
     >     > Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 14.6 scrubstarts
     >     > Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 10.c deep-scrubstarts
     >     > Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 12.3 deep-scrubstarts
     >     > Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 6.0 scrubstarts
     >     > Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 8.5 deep-scrubstarts
     >     > Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 5.66 deep-scrubstarts
     >     > Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 5.30 deep-scrubstarts
     >     > Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.b deep-scrubstarts
     >     > Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1a deep-scrubstarts
     >     > Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1c deep-scrubstarts
     >     > Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 11.1 scrubstarts
     >     > Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 14.6 scrubstarts
     >     > Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 10.c deep-scrubstarts
     >     > Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 12.3 scrubstarts
     >     > Mar 21 14:22:13 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 6.0 scrubstarts
     >     > Mar 21 14:22:15 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 8.5 deep-scrubstarts
     >     > Mar 21 14:22:20 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 5.66 deep-scrubstarts
     >     > Mar 21 14:22:27 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 5.30 scrubstarts
     >     > Mar 21 14:22:30 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.b deep-scrubstarts
     >     > Mar 21 14:22:32 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1a deep-scrubstarts
     >     > Mar 21 14:22:33 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1c deep-scrubstarts
     >     > Mar 21 14:22:35 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 11.1 deep-scrubstarts
     >     > Mar 21 14:22:37 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 14.6 scrubstarts
     >     > Mar 21 14:22:38 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 10.c scrubstarts
     >     > Mar 21 14:22:39 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 12.3 scrubstarts
     >     > Mar 21 14:22:41 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 6.0 deep-scrubstarts
     >     > Mar 21 14:22:43 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 8.5 deep-scrubstarts
     >     > Mar 21 14:22:46 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 5.66 deep-scrubstarts
     >     > Mar 21 14:22:49 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 5.30 scrubstarts
     >     > Mar 21 14:22:55 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.b deep-scrubstarts
     >     > Mar 21 14:22:57 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1a deep-scrubstarts
     >     > Mar 21 14:22:58 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 13.1c deep-scrubstarts
     >     > Mar 21 14:23:03 ceph-node10 ceph-osd[3804193]:
     log_channel(cluster)
     >     > log [DBG] : 11.1 deep-scrubstarts
     >     >
     >     >
     >     >
     >     > *
     >     > *The amount of scrubbed/deep-scrubbed pgs changes every
     few seconds.
     >     >
     >     > [root@ceph-node10 ~]# ceph -s | grep active+clean
     >     >    pgs:     214 active+clean
     >     >             50 active+clean+scrubbing+deep
     >     >             25 active+clean+scrubbing
     >     > [root@ceph-node10 ~]# ceph -s | grep active+clean
     >     >    pgs:     208 active+clean
     >     >             53 active+clean+scrubbing+deep
     >     >             28 active+clean+scrubbing
     >     > [root@ceph-node10 ~]# ceph -s | grep active+clean
     >     >    pgs:     208 active+clean
     >     >             53 active+clean+scrubbing+deep
     >     >             28 active+clean+scrubbing
     >     > [root@ceph-node10 ~]# ceph -s | grep active+clean
     >     >    pgs:     207 active+clean
     >     >             54 active+clean+scrubbing+deep
     >     >             28 active+clean+scrubbing
     >     > [root@ceph-node10 ~]# ceph -s | grep active+clean
     >     >    pgs:     202 active+clean
     >     >             56 active+clean+scrubbing+deep
     >     >             31 active+clean+scrubbing
     >     > [root@ceph-node10 ~]# ceph -s | grep active+clean
     >     >    pgs:     213 active+clean
     >     >             45 active+clean+scrubbing+deep
     >     >             31 active+clean+scrubbing
     >     >
     >     > ceph pg dump showing PGs which are not deep scrubbed since
     january.
     >     > Some PGs deep scrubbing  over 700000 seconds.
     >     >
     >     > *[ceph: root@ceph-node10 /]#  ceph pg dump pgs | grep -e
     >     'scrubbing f'
     >     > 5.6e      221223                   0         0          0
            0
     >     >  927795290112            0           0  4073      3000
          4073
     >     >  active+clean+scrubbing+deep  2024-03-20T01:07:21.196293+
     >     > 0000  128383'15766927  128383:20517419   [2,4,18,16,14,21]
     >               2
     >     >   [2,4,18,16,14,21]               2  125519'12328877
     >     >  2024-01-23T11:25:35.503811+0000  124844'11873951
      2024-01-21T22:
     >     > 24:12.620693+0000              0                    5  deep
     >     scrubbing
     >     > for 270790s                                             53772
     >     >                0
     >     > 5.6c      221317                   0         0          0
            0
     >     >  928173256704            0           0  6332         0
          6332
     >     >  active+clean+scrubbing+deep  2024-03-18T09:29:29.233084+
     >     > 0000  128382'15788196  128383:20727318     [6,9,12,14,1,4]
     >               6
     >     >     [6,9,12,14,1,4]               6  127180'14709746
     >     >  2024-03-06T12:47:57.741921+0000  124817'11821502
      2024-01-20T20:
     >     > 59:40.566384+0000              0                13452  deep
     >     scrubbing
     >     > for 273519s                                            122803
     >     >                0
     >     > 5.6a      221325                   0         0          0
            0
     >     >  928184565760            0           0  4649      3000
          4649
     >     >  active+clean+scrubbing+deep  2024-03-13T03:48:54.065125+
     >     > 0000  128382'16031499  128383:21221685     [13,11,1,2,9,8]
     >              13
     >     >     [13,11,1,2,9,8]              13  127181'14915404
     >     >  2024-03-06T13:16:58.635982+0000  125967'12517899
      2024-01-28T09:
     >     > 13:08.276930+0000              0                10078  deep
     >     scrubbing
     >     > for 726001s                                            184819
     >     >                0
     >     > 5.54      221050                   0         0          0
            0
     >     >  927036203008            0           0  4864      3000
          4864
     >     >  active+clean+scrubbing+deep  2024-03-18T00:17:48.086231+
     >     > 0000  128383'15584012  128383:20293678  [0,20,18,19,11,12]
     >               0
     >     >  [0,20,18,19,11,12]               0  127195'14651908
     >     >  2024-03-07T09:22:31.078448+0000  124816'11813857
      2024-01-20T16:
     >     > 43:15.755200+0000              0                 9808  deep
     >     scrubbing
     >     > for 306667s                                            142126
     >     >                0
     >     > 5.47      220849                   0         0          0
            0
     >     >  926233448448            0           0  5592         0
          5592
     >     >  active+clean+scrubbing+deep  2024-03-12T08:10:39.413186+
     >     > 0000  128382'15653864  128383:20403071  [16,15,20,0,13,21]
     >              16
     >     >  [16,15,20,0,13,21]              16  127183'14600433
     >     >  2024-03-06T18:21:03.057165+0000  124809'11792397
      2024-01-20T05:
     >     > 27:07.617799+0000              0                13066  deep
     >     scrubbing
     >     > for 796697s                                            209193
     >     >                0
     >     > dumped pgs
     >     >
     >     >
     >     > *
     >     >
     >     >
     >     > regards
     >     > Bernhard
     >     >
     >     >
     >     >
     >     >
     >     >
     >     >
     >     > On 20/03/2024 21:12, Bandelow, Gunnar wrote:
     >     >> Hi,
     >     >>
     >     >> i just wanted to mention, that i am running a cluster
     with reef
     >     >> 18.2.1 with the same issue.
     >     >>
     >     >> 4 PGs start to deepscrub but dont finish since mid
     february. In
     >     the
     >     >> pg dump they are shown as scheduled for deep scrub. They
     sometimes
     >     >> change their status from active+clean to
     >     active+clean+scrubbing+deep
     >     >> and back.
     >     >>
     >     >> Best regards,
     >     >> Gunnar
     >     >>
     >     >> =======================================================
     >     >>
     >     >> Gunnar Bandelow
     >     >> Universitätsrechenzentrum (URZ)
     >     >> Universität Greifswald
     >     >> Felix-Hausdorff-Straße 18
     >     >> 17489 Greifswald
     >     >> Germany
     >     >>
     >     >> Tel.: +49 3834 420 1450
     >     >>
     >     >>
     >     >>
     >     >>
     >     >> --- Original Nachricht ---
     >     >> *Betreff: * Re: Reef (18.2): Some PG not
     scrubbed/deep
     >     >> scrubbed for 1 month
     >     >> *Von: *"Michel Jouvin" <michel.jouvin@xxxxxxxxxxxxxxx
     >     <mailto:michel.jouvin@xxxxxxxxxxxxxxx>
     >     >> <michel.jouvin@xxxxxxxxxxxxxxx
     >  <mailto:michel.jouvin@xxxxxxxxxxxxxxx>>>
     >     >> *An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
     >     <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
     >     >> *Datum: *20-03-2024 20:00
     >     >>
     >     >>
     >     >>
     >     >>     Hi Rafael,
     >     >>
     >     >>     Good to know I am not alone!
     >     >>
     >     >>     Additional information ~6h after the OSD restart:
     over the
     >     20 PGs
     >     >>     impacted, 2 have been processed successfully... I don't
     >     have a clear
     >     >>     picture on how Ceph prioritize the scrub of one PG over
     >     another, I
     >     >>     had
     >     >>     thought that the oldest/expired scrubs are taken
     first but
     >     it may
     >     >>     not be
     >     >>     the case. Anyway, I have seen a very significant
     decrese of
     >     the
     >     >> scrub
     >     >>     activity this afternoon and the cluster is not loaded
     at all
     >     >>     (almost no
     >     >>     users yet)...
     >     >>
     >     >>     Michel
     >     >>
     >     >>     Le 20/03/2024 à 17:55, quaglio@xxxxxxxxxx
     >     <mailto:quaglio@xxxxxxxxxx>
     >     >>     <quaglio@xxxxxxxxxx <mailto:quaglio@xxxxxxxxxx>> a
     écrit :
     >     >>     > Hi,
     >     >>     >      I upgraded a cluster 2 weeks ago here. The
     situation
     >     is the
     >     >>     same
     >     >>     > as Michel.
     >     >>     >      A lot of PGs no scrubbed/deep-scrubed.
     >     >>     >
     >     >>     > Rafael.
     >     >>     >
     >     >>     > _______________________________________________
     >     >>     > ceph-users mailing list -- ceph-users@xxxxxxx
     >     <mailto:ceph-users@xxxxxxx>
     >     >>     <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
     >     >>     > To unsubscribe send an email to
     ceph-users-leave@xxxxxxx
     >     <mailto:ceph-users-leave@xxxxxxx>
     >     >>     <ceph-users-leave@xxxxxxx
     <mailto:ceph-users-leave@xxxxxxx>>
     >     >> _______________________________________________
     >     >>     ceph-users mailing list -- ceph-users@xxxxxxx
     >     <mailto:ceph-users@xxxxxxx>
     >     >>     <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
     >     >>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
     >     <mailto:ceph-users-leave@xxxxxxx>
     >     >>     <ceph-users-leave@xxxxxxx
     <mailto:ceph-users-leave@xxxxxxx>>
     >     >>
     >     >>
     >     >> _______________________________________________
     >     >> ceph-users mailing list --ceph-users@xxxxxxx
     >     <mailto:ceph-users@xxxxxxx>
     >     >> To unsubscribe send an email toceph-users-leave@xxxxxxx
     >     <mailto:toceph-users-leave@xxxxxxx>
     >     >
     >     > _______________________________________________
     >     > ceph-users mailing list -- ceph-users@xxxxxxx
     >     <mailto:ceph-users@xxxxxxx>
     >     > To unsubscribe send an email to ceph-users-leave@xxxxxxx
     >     <mailto:ceph-users-leave@xxxxxxx>
     >  _______________________________________________
     >     ceph-users mailing list -- ceph-users@xxxxxxx
     >     <mailto:ceph-users@xxxxxxx>
     >     To unsubscribe send an email to ceph-users-leave@xxxxxxx
     >     <mailto:ceph-users-leave@xxxxxxx>
     >
     >
     > _______________________________________________
     > ceph-users mailing list --ceph-users@xxxxxxx
     > To unsubscribe send an email toceph-users-leave@xxxxxxx
     _______________________________________________
     ceph-users mailing list -- ceph-users@xxxxxxx
     To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux