Ok, Thanks. You mean that the autoscale feature is... stupid ? I'm going to change pgp_num and use the legacy formula OSDs * 100 / pool size. Le jeu. 2 mars 2023 à 17:04, Curt <lightspd@xxxxxxxxx> a écrit : > I see autoscale_mode on all pools and I'm guessing this is your largest > pool bkp365-ncy.rgw.buckets.data, with 32 pg. I would definitely turn off > autoscale and increase pg_num/pgp_num. Someone with more experience than I > can chime in, but I would think something like 2048 would be much better. > > On Thu, Mar 2, 2023 at 6:12 PM Joffrey <joff.au@xxxxxxxxx> wrote: > >> root@hbgt-ceph1-mon3:/# ceph osd df >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META >> AVAIL %USE VAR PGS STATUS >> 1 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 11 KiB 23 >> GiB 11 TiB 36.17 1.39 17 up >> 3 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.7 GiB 17 >> GiB 12 TiB 28.47 1.09 11 up >> 5 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.2 GiB 12 >> GiB 14 TiB 20.89 0.80 13 up >> 7 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.2 GiB 6.9 >> GiB 15 TiB 13.32 0.51 19 up >> 9 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 68 MiB 18 >> GiB 12 TiB 28.53 1.09 18 up >> 11 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 403 MiB 23 >> GiB 11 TiB 36.13 1.38 17 up >> 13 hdd 17.34140 1.00000 17 TiB 1001 GiB 7.1 GiB 9.9 MiB 1.1 >> GiB 16 TiB 5.64 0.22 18 up >> 15 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 842 KiB 34 >> GiB 8.4 TiB 51.41 1.97 18 up >> 17 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 24 KiB 12 >> GiB 14 TiB 20.90 0.80 17 up >> 19 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 4.1 GiB 6.2 >> GiB 15 TiB 13.31 0.51 18 up >> 21 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 206 MiB 17 >> GiB 12 TiB 28.55 1.09 23 up >> 23 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 4.2 GiB 17 >> GiB 12 TiB 28.54 1.09 14 up >> 0 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 7.2 GiB 12 >> GiB 14 TiB 20.94 0.80 18 up >> 2 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 18 KiB 12 >> GiB 14 TiB 20.93 0.80 13 up >> 4 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.0 GiB 12 >> GiB 14 TiB 20.95 0.80 20 up >> 6 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 4.4 MiB 34 >> GiB 8.4 TiB 51.36 1.97 17 up >> 8 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 965 KiB 6.5 >> GiB 15 TiB 13.26 0.51 14 up >> 10 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 18 KiB 6.5 >> GiB 15 TiB 13.25 0.51 13 up >> 12 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 98 MiB 17 >> GiB 12 TiB 28.49 1.09 16 up >> 14 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 4.2 GiB 17 >> GiB 12 TiB 28.55 1.09 20 up >> 16 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 24 KiB 12 >> GiB 14 TiB 20.94 0.80 20 up >> 18 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 17 MiB 34 >> GiB 8.4 TiB 51.42 1.97 19 up >> 20 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.2 GiB 17 >> GiB 12 TiB 28.50 1.09 18 up >> 22 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 2.7 GiB 6.2 >> GiB 15 TiB 13.25 0.51 11 up >> 24 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 70 MiB 17 >> GiB 12 TiB 28.50 1.09 18 up >> 25 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.0 GiB 17 >> GiB 12 TiB 28.51 1.09 16 up >> 26 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 3.0 GiB 23 >> GiB 11 TiB 36.13 1.38 15 up >> 27 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 205 MiB 17 >> GiB 12 TiB 28.59 1.10 16 up >> 28 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 1.0 MiB 6.3 >> GiB 15 TiB 13.27 0.51 12 up >> 29 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 1.3 MiB 17 >> GiB 12 TiB 28.50 1.09 4 up >> 30 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 379 KiB 23 >> GiB 11 TiB 36.14 1.38 16 up >> 31 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 2.5 MiB 12 >> GiB 14 TiB 20.92 0.80 19 up >> 32 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 11 MiB 12 >> GiB 14 TiB 20.93 0.80 16 up >> 33 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 18 KiB 12 >> GiB 14 TiB 20.91 0.80 17 up >> 34 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 71 MiB 23 >> GiB 11 TiB 36.15 1.38 19 up >> 35 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.3 GiB 6.3 >> GiB 15 TiB 13.28 0.51 14 up >> 36 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 0 B 17 >> GiB 12 TiB 28.59 1.09 13 up >> 37 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 69 MiB 17 >> GiB 12 TiB 28.54 1.09 12 up >> 38 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 2.9 GiB 6.7 >> GiB 15 TiB 13.26 0.51 22 up >> 39 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 205 MiB 23 >> GiB 11 TiB 36.19 1.39 25 up >> 40 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 9 KiB 12 >> GiB 14 TiB 20.88 0.80 14 up >> 41 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 8.2 GiB 23 >> GiB 11 TiB 36.11 1.38 20 up >> 42 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 55 KiB 12 >> GiB 14 TiB 20.91 0.80 16 up >> 43 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 70 MiB 23 >> GiB 11 TiB 36.17 1.39 21 up >> 44 hdd 17.34140 1.00000 17 TiB 7.6 TiB 6.6 TiB 18 KiB 28 >> GiB 9.8 TiB 43.75 1.68 16 up >> 45 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 141 MiB 6.5 >> GiB 15 TiB 13.29 0.51 17 up >> 46 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 1.7 MiB 6.4 >> GiB 15 TiB 13.27 0.51 15 up >> 47 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.5 GiB 11 >> GiB 14 TiB 20.89 0.80 22 up >> 48 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 9 KiB 6.3 >> GiB 15 TiB 13.25 0.51 10 up >> 49 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 4 KiB 33 >> GiB 8.4 TiB 51.41 1.97 18 up >> 50 hdd 17.34140 1.00000 17 TiB 7.6 TiB 6.6 TiB 212 MiB 31 >> GiB 9.7 TiB 43.81 1.68 20 up >> 51 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.6 TiB 85 MiB 13 >> GiB 14 TiB 20.87 0.80 19 up >> 52 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 5.4 GiB 6.0 >> GiB 15 TiB 13.34 0.51 18 up >> 53 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 25 MiB 19 >> GiB 12 TiB 28.55 1.09 16 up >> 54 hdd 17.34140 1.00000 17 TiB 6.2 TiB 5.3 TiB 198 MiB 23 >> GiB 11 TiB 35.99 1.38 14 up >> 55 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 10 GiB 18 >> GiB 12 TiB 28.59 1.09 26 up >> 56 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 153 MiB 24 >> GiB 11 TiB 36.14 1.38 22 up >> 57 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 58 KiB 12 >> GiB 14 TiB 20.91 0.80 13 up >> 58 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.3 GiB 6.4 >> GiB 15 TiB 13.23 0.51 11 up >> 59 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 19 KiB 6.3 >> GiB 15 TiB 13.27 0.51 11 up >> TOTAL 1.0 PiB 272 TiB 213 TiB 84 GiB 942 >> GiB 769 TiB 26.11 >> >> >> root@hbgt-ceph1-mon3:/# ceph osd dump | grep pool >> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 15503 lfor >> 0/8533/8531 flags hashpspool stripe_width 0 pg_num_min 1 application >> mgr,mgr_devicehealth >> pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8321 lfor >> 0/8321/8319 flags hashpspool stripe_width 0 application rgw >> pool 3 'bkp365-ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> 8297 lfor 0/8297/8295 flags hashpspool stripe_width 0 application rgw >> pool 4 'bkp365-ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> 8054 lfor 0/8054/8052 flags hashpspool stripe_width 0 application rgw >> pool 5 'bkp365-ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 3412 >> lfor 0/3412/3410 flags hashpspool stripe_width 0 pg_autoscale_bias 4 >> pg_num_min 8 application rgw >> pool 6 'bkp365-ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size >> 4 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on >> last_change 3500 lfor 0/0/2720 flags hashpspool stripe_width 12288 >> application rgw >> pool 7 'bkp365-ncy.rgw.buckets.index' replicated size 3 min_size 2 >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on >> last_change 3436 lfor 0/3436/3434 flags hashpspool stripe_width 0 >> pg_autoscale_bias 4 pg_num_min 8 application rgw >> pool 9 'ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4 >> crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on >> last_change 14975 lfor 0/0/14973 flags hashpspool stripe_width 12288 >> application rgw >> pool 10 'ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> 14979 flags hashpspool stripe_width 0 application rgw >> pool 11 'ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> 14981 flags hashpspool stripe_width 0 application rgw >> pool 12 'ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15105 >> lfor 0/15105/15103 flags hashpspool stripe_width 0 pg_autoscale_bias 4 >> pg_num_min 8 application rgw >> pool 13 'ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15236 >> lfor 0/15236/15234 flags hashpspool stripe_width 0 pg_autoscale_bias 4 >> pg_num_min 8 application rgw >> pool 14 'ncy.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule >> 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change >> 15241 flags hashpspool stripe_width 0 application rgw >> >> (EC32 is a erasure coding with 3 datas and 2 codes) >> >> No output with "ceph osd pool autoscale-status" >> >> Le jeu. 2 mars 2023 à 15:02, Curt <lightspd@xxxxxxxxx> a écrit : >> >>> Forgot to do a reply all. >>> >>> What does >>> >>> ceph osd df >>> ceph osd dump | grep pool return? >>> >>> Are you using auto scaling? 289pg with 272tb of data and 60 osds, that >>> seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of this >>> wrong. >>> >>> On Thu, Mar 2, 2023, 17:37 Joffrey <joff.au@xxxxxxxxx> wrote: >>> >>>> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are >>>> defaults. I tried some updates on osd-max-backfills but no change. >>>> I have many HDD with NVME for db and all are connected in a 25G network. >>>> >>>> Yes, it's the same PG since 4 days. >>>> >>>> I got a failure on a HDD and get many days of recovery+backfilling >>>> last 2 >>>> weeks. Perhaps the 'not in time' warning is related to this. >>>> >>>> 'Jof >>>> >>>> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri <aad@xxxxxxxxxxxxxx> a >>>> écrit : >>>> >>>> > Run `ceph health detail`. >>>> > >>>> > Is it the same PG backfilling for a long time, or a different one over >>>> > time? >>>> > >>>> > That it’s remapped makes me think that what you’re seeing is the >>>> balancer >>>> > doing its job. >>>> > >>>> > As far as the scrubbing, do you limit the times when scrubbing can >>>> happen? >>>> > Are these HDDs? EC? >>>> > >>>> > > On Mar 2, 2023, at 07:20, Joffrey <joff.au@xxxxxxxxx> wrote: >>>> > > >>>> > > Hi, >>>> > > >>>> > > I have many 'not {deep-}scrubbed in time' and a1 PG >>>> remapped+backfilling >>>> > > and I don't understand why this backfilling is taking so long. >>>> > > >>>> > > root@hbgt-ceph1-mon3:/# ceph -s >>>> > > cluster: >>>> > > id: c300532c-51fa-11ec-9a41-0050569c3b55 >>>> > > health: HEALTH_WARN >>>> > > 15 pgs not deep-scrubbed in time >>>> > > 13 pgs not scrubbed in time >>>> > > >>>> > > services: >>>> > > mon: 3 daemons, quorum >>>> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3 >>>> > > (age 36h) >>>> > > mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys: >>>> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm >>>> > > osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped >>>> pgs >>>> > > rgw: 3 daemons active (3 hosts, 2 zones) >>>> > > >>>> > > data: >>>> > > pools: 13 pools, 289 pgs >>>> > > objects: 67.74M objects, 127 TiB >>>> > > usage: 272 TiB used, 769 TiB / 1.0 PiB avail >>>> > > pgs: 288 active+clean >>>> > > 1 active+remapped+backfilling >>>> > > >>>> > > io: >>>> > > client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr >>>> > > recovery: 790 KiB/s, 0 objects/s >>>> > > >>>> > > >>>> > > What can I do to understand this slow recovery (is it the backfill >>>> > action ?) >>>> > > >>>> > > Thanks you >>>> > > >>>> > > 'Jof >>>> > > _______________________________________________ >>>> > > ceph-users mailing list -- ceph-users@xxxxxxx >>>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> > >>>> > >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx