Hello Joffrey, I am not sure why my previous reply did not go through. I am replying on this thread again. The slow backfills could also be due to the mclock_scheduler throttling the recovery operations aggressively and this issue is currently being fixed. To confirm if mclock_scheduler is causing this, can you please run a couple of commands and post the output on this thread? 1. ceph versions 2. ceph config show osd.<id> | grep osd_max_backfills 3. ceph config show osd.<id> | grep osd_recovery_max_active 4. ceph config show osd.<id> | grep osd_mclock With mclock_scheduler enabled, and with 17.2.5, it is not possible to override the recovery options like osd_max_backfills, osd_recovery_max_active and other related options. Based on the result of the above commands, I can confirm if mclock_scheduler is causing the issue and suggest next steps. -Sridhar On Fri, Mar 3, 2023 at 9:35 PM Joffrey <joff.au@xxxxxxxxx> wrote: > Ok, Thanks. You mean that the autoscale feature is... stupid ? > I'm going to change pgp_num and use the legacy formula OSDs * 100 / pool > size. > > > Le jeu. 2 mars 2023 à 17:04, Curt <lightspd@xxxxxxxxx> a écrit : > > > I see autoscale_mode on all pools and I'm guessing this is your largest > > pool bkp365-ncy.rgw.buckets.data, with 32 pg. I would definitely turn off > > autoscale and increase pg_num/pgp_num. Someone with more experience than > I > > can chime in, but I would think something like 2048 would be much better. > > > > On Thu, Mar 2, 2023 at 6:12 PM Joffrey <joff.au@xxxxxxxxx> wrote: > > > >> root@hbgt-ceph1-mon3:/# ceph osd df > >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > >> AVAIL %USE VAR PGS STATUS > >> 1 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 11 KiB 23 > >> GiB 11 TiB 36.17 1.39 17 up > >> 3 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.7 GiB 17 > >> GiB 12 TiB 28.47 1.09 11 up > >> 5 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.2 GiB 12 > >> GiB 14 TiB 20.89 0.80 13 up > >> 7 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.2 GiB 6.9 > >> GiB 15 TiB 13.32 0.51 19 up > >> 9 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 68 MiB 18 > >> GiB 12 TiB 28.53 1.09 18 up > >> 11 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 403 MiB 23 > >> GiB 11 TiB 36.13 1.38 17 up > >> 13 hdd 17.34140 1.00000 17 TiB 1001 GiB 7.1 GiB 9.9 MiB 1.1 > >> GiB 16 TiB 5.64 0.22 18 up > >> 15 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 842 KiB 34 > >> GiB 8.4 TiB 51.41 1.97 18 up > >> 17 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 24 KiB 12 > >> GiB 14 TiB 20.90 0.80 17 up > >> 19 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 4.1 GiB 6.2 > >> GiB 15 TiB 13.31 0.51 18 up > >> 21 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 206 MiB 17 > >> GiB 12 TiB 28.55 1.09 23 up > >> 23 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 4.2 GiB 17 > >> GiB 12 TiB 28.54 1.09 14 up > >> 0 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 7.2 GiB 12 > >> GiB 14 TiB 20.94 0.80 18 up > >> 2 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 18 KiB 12 > >> GiB 14 TiB 20.93 0.80 13 up > >> 4 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.0 GiB 12 > >> GiB 14 TiB 20.95 0.80 20 up > >> 6 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 4.4 MiB 34 > >> GiB 8.4 TiB 51.36 1.97 17 up > >> 8 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 965 KiB 6.5 > >> GiB 15 TiB 13.26 0.51 14 up > >> 10 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 18 KiB 6.5 > >> GiB 15 TiB 13.25 0.51 13 up > >> 12 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 98 MiB 17 > >> GiB 12 TiB 28.49 1.09 16 up > >> 14 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 4.2 GiB 17 > >> GiB 12 TiB 28.55 1.09 20 up > >> 16 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 24 KiB 12 > >> GiB 14 TiB 20.94 0.80 20 up > >> 18 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 17 MiB 34 > >> GiB 8.4 TiB 51.42 1.97 19 up > >> 20 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.2 GiB 17 > >> GiB 12 TiB 28.50 1.09 18 up > >> 22 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 2.7 GiB 6.2 > >> GiB 15 TiB 13.25 0.51 11 up > >> 24 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 70 MiB 17 > >> GiB 12 TiB 28.50 1.09 18 up > >> 25 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.0 GiB 17 > >> GiB 12 TiB 28.51 1.09 16 up > >> 26 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 3.0 GiB 23 > >> GiB 11 TiB 36.13 1.38 15 up > >> 27 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 205 MiB 17 > >> GiB 12 TiB 28.59 1.10 16 up > >> 28 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 1.0 MiB 6.3 > >> GiB 15 TiB 13.27 0.51 12 up > >> 29 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 1.3 MiB 17 > >> GiB 12 TiB 28.50 1.09 4 up > >> 30 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 379 KiB 23 > >> GiB 11 TiB 36.14 1.38 16 up > >> 31 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 2.5 MiB 12 > >> GiB 14 TiB 20.92 0.80 19 up > >> 32 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 11 MiB 12 > >> GiB 14 TiB 20.93 0.80 16 up > >> 33 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 18 KiB 12 > >> GiB 14 TiB 20.91 0.80 17 up > >> 34 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 71 MiB 23 > >> GiB 11 TiB 36.15 1.38 19 up > >> 35 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.3 GiB 6.3 > >> GiB 15 TiB 13.28 0.51 14 up > >> 36 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 0 B 17 > >> GiB 12 TiB 28.59 1.09 13 up > >> 37 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 69 MiB 17 > >> GiB 12 TiB 28.54 1.09 12 up > >> 38 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 2.9 GiB 6.7 > >> GiB 15 TiB 13.26 0.51 22 up > >> 39 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 205 MiB 23 > >> GiB 11 TiB 36.19 1.39 25 up > >> 40 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 9 KiB 12 > >> GiB 14 TiB 20.88 0.80 14 up > >> 41 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 8.2 GiB 23 > >> GiB 11 TiB 36.11 1.38 20 up > >> 42 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 55 KiB 12 > >> GiB 14 TiB 20.91 0.80 16 up > >> 43 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 70 MiB 23 > >> GiB 11 TiB 36.17 1.39 21 up > >> 44 hdd 17.34140 1.00000 17 TiB 7.6 TiB 6.6 TiB 18 KiB 28 > >> GiB 9.8 TiB 43.75 1.68 16 up > >> 45 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 141 MiB 6.5 > >> GiB 15 TiB 13.29 0.51 17 up > >> 46 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 1.7 MiB 6.4 > >> GiB 15 TiB 13.27 0.51 15 up > >> 47 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.5 GiB 11 > >> GiB 14 TiB 20.89 0.80 22 up > >> 48 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 9 KiB 6.3 > >> GiB 15 TiB 13.25 0.51 10 up > >> 49 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 4 KiB 33 > >> GiB 8.4 TiB 51.41 1.97 18 up > >> 50 hdd 17.34140 1.00000 17 TiB 7.6 TiB 6.6 TiB 212 MiB 31 > >> GiB 9.7 TiB 43.81 1.68 20 up > >> 51 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.6 TiB 85 MiB 13 > >> GiB 14 TiB 20.87 0.80 19 up > >> 52 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 5.4 GiB 6.0 > >> GiB 15 TiB 13.34 0.51 18 up > >> 53 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 25 MiB 19 > >> GiB 12 TiB 28.55 1.09 16 up > >> 54 hdd 17.34140 1.00000 17 TiB 6.2 TiB 5.3 TiB 198 MiB 23 > >> GiB 11 TiB 35.99 1.38 14 up > >> 55 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 10 GiB 18 > >> GiB 12 TiB 28.59 1.09 26 up > >> 56 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 153 MiB 24 > >> GiB 11 TiB 36.14 1.38 22 up > >> 57 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 58 KiB 12 > >> GiB 14 TiB 20.91 0.80 13 up > >> 58 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.3 GiB 6.4 > >> GiB 15 TiB 13.23 0.51 11 up > >> 59 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 19 KiB 6.3 > >> GiB 15 TiB 13.27 0.51 11 up > >> TOTAL 1.0 PiB 272 TiB 213 TiB 84 GiB 942 > >> GiB 769 TiB 26.11 > >> > >> > >> root@hbgt-ceph1-mon3:/# ceph osd dump | grep pool > >> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash > >> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 15503 lfor > >> 0/8533/8531 flags hashpspool stripe_width 0 pg_num_min 1 application > >> mgr,mgr_devicehealth > >> pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash > >> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8321 lfor > >> 0/8321/8319 flags hashpspool stripe_width 0 application rgw > >> pool 3 'bkp365-ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change > >> 8297 lfor 0/8297/8295 flags hashpspool stripe_width 0 application rgw > >> pool 4 'bkp365-ncy.rgw.control' replicated size 3 min_size 2 crush_rule > 0 > >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change > >> 8054 lfor 0/8054/8052 flags hashpspool stripe_width 0 application rgw > >> pool 5 'bkp365-ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change > 3412 > >> lfor 0/3412/3410 flags hashpspool stripe_width 0 pg_autoscale_bias 4 > >> pg_num_min 8 application rgw > >> pool 6 'bkp365-ncy.rgw.buckets.data' erasure profile EC32 size 5 > min_size > >> 4 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode > on > >> last_change 3500 lfor 0/0/2720 flags hashpspool stripe_width 12288 > >> application rgw > >> pool 7 'bkp365-ncy.rgw.buckets.index' replicated size 3 min_size 2 > >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on > >> last_change 3436 lfor 0/3436/3434 flags hashpspool stripe_width 0 > >> pg_autoscale_bias 4 pg_num_min 8 application rgw > >> pool 9 'ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4 > >> crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on > >> last_change 14975 lfor 0/0/14973 flags hashpspool stripe_width 12288 > >> application rgw > >> pool 10 'ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change > >> 14979 flags hashpspool stripe_width 0 application rgw > >> pool 11 'ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change > >> 14981 flags hashpspool stripe_width 0 application rgw > >> pool 12 'ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change > 15105 > >> lfor 0/15105/15103 flags hashpspool stripe_width 0 pg_autoscale_bias 4 > >> pg_num_min 8 application rgw > >> pool 13 'ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule > 0 > >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change > 15236 > >> lfor 0/15236/15234 flags hashpspool stripe_width 0 pg_autoscale_bias 4 > >> pg_num_min 8 application rgw > >> pool 14 'ncy.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule > >> 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on > last_change > >> 15241 flags hashpspool stripe_width 0 application rgw > >> > >> (EC32 is a erasure coding with 3 datas and 2 codes) > >> > >> No output with "ceph osd pool autoscale-status" > >> > >> Le jeu. 2 mars 2023 à 15:02, Curt <lightspd@xxxxxxxxx> a écrit : > >> > >>> Forgot to do a reply all. > >>> > >>> What does > >>> > >>> ceph osd df > >>> ceph osd dump | grep pool return? > >>> > >>> Are you using auto scaling? 289pg with 272tb of data and 60 osds, that > >>> seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of > this > >>> wrong. > >>> > >>> On Thu, Mar 2, 2023, 17:37 Joffrey <joff.au@xxxxxxxxx> wrote: > >>> > >>>> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are > >>>> defaults. I tried some updates on osd-max-backfills but no change. > >>>> I have many HDD with NVME for db and all are connected in a 25G > network. > >>>> > >>>> Yes, it's the same PG since 4 days. > >>>> > >>>> I got a failure on a HDD and get many days of recovery+backfilling > >>>> last 2 > >>>> weeks. Perhaps the 'not in time' warning is related to this. > >>>> > >>>> 'Jof > >>>> > >>>> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri <aad@xxxxxxxxxxxxxx> a > >>>> écrit : > >>>> > >>>> > Run `ceph health detail`. > >>>> > > >>>> > Is it the same PG backfilling for a long time, or a different one > over > >>>> > time? > >>>> > > >>>> > That it’s remapped makes me think that what you’re seeing is the > >>>> balancer > >>>> > doing its job. > >>>> > > >>>> > As far as the scrubbing, do you limit the times when scrubbing can > >>>> happen? > >>>> > Are these HDDs? EC? > >>>> > > >>>> > > On Mar 2, 2023, at 07:20, Joffrey <joff.au@xxxxxxxxx> wrote: > >>>> > > > >>>> > > Hi, > >>>> > > > >>>> > > I have many 'not {deep-}scrubbed in time' and a1 PG > >>>> remapped+backfilling > >>>> > > and I don't understand why this backfilling is taking so long. > >>>> > > > >>>> > > root@hbgt-ceph1-mon3:/# ceph -s > >>>> > > cluster: > >>>> > > id: c300532c-51fa-11ec-9a41-0050569c3b55 > >>>> > > health: HEALTH_WARN > >>>> > > 15 pgs not deep-scrubbed in time > >>>> > > 13 pgs not scrubbed in time > >>>> > > > >>>> > > services: > >>>> > > mon: 3 daemons, quorum > >>>> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3 > >>>> > > (age 36h) > >>>> > > mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys: > >>>> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm > >>>> > > osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped > >>>> pgs > >>>> > > rgw: 3 daemons active (3 hosts, 2 zones) > >>>> > > > >>>> > > data: > >>>> > > pools: 13 pools, 289 pgs > >>>> > > objects: 67.74M objects, 127 TiB > >>>> > > usage: 272 TiB used, 769 TiB / 1.0 PiB avail > >>>> > > pgs: 288 active+clean > >>>> > > 1 active+remapped+backfilling > >>>> > > > >>>> > > io: > >>>> > > client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr > >>>> > > recovery: 790 KiB/s, 0 objects/s > >>>> > > > >>>> > > > >>>> > > What can I do to understand this slow recovery (is it the backfill > >>>> > action ?) > >>>> > > > >>>> > > Thanks you > >>>> > > > >>>> > > 'Jof > >>>> > > _______________________________________________ > >>>> > > ceph-users mailing list -- ceph-users@xxxxxxx > >>>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>> > > >>>> > > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>>> > >>> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Sridhar Seshasayee Partner Engineer Red Hat <https://www.redhat.com> <https://www.redhat.com> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx