root@hbgt-ceph1-mon3:/# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 1 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 11 KiB 23 GiB 11 TiB 36.17 1.39 17 up 3 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.7 GiB 17 GiB 12 TiB 28.47 1.09 11 up 5 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.2 GiB 12 GiB 14 TiB 20.89 0.80 13 up 7 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.2 GiB 6.9 GiB 15 TiB 13.32 0.51 19 up 9 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 68 MiB 18 GiB 12 TiB 28.53 1.09 18 up 11 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 403 MiB 23 GiB 11 TiB 36.13 1.38 17 up 13 hdd 17.34140 1.00000 17 TiB 1001 GiB 7.1 GiB 9.9 MiB 1.1 GiB 16 TiB 5.64 0.22 18 up 15 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 842 KiB 34 GiB 8.4 TiB 51.41 1.97 18 up 17 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 24 KiB 12 GiB 14 TiB 20.90 0.80 17 up 19 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 4.1 GiB 6.2 GiB 15 TiB 13.31 0.51 18 up 21 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 206 MiB 17 GiB 12 TiB 28.55 1.09 23 up 23 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 4.2 GiB 17 GiB 12 TiB 28.54 1.09 14 up 0 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 7.2 GiB 12 GiB 14 TiB 20.94 0.80 18 up 2 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 18 KiB 12 GiB 14 TiB 20.93 0.80 13 up 4 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.0 GiB 12 GiB 14 TiB 20.95 0.80 20 up 6 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 4.4 MiB 34 GiB 8.4 TiB 51.36 1.97 17 up 8 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 965 KiB 6.5 GiB 15 TiB 13.26 0.51 14 up 10 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 18 KiB 6.5 GiB 15 TiB 13.25 0.51 13 up 12 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 98 MiB 17 GiB 12 TiB 28.49 1.09 16 up 14 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 4.2 GiB 17 GiB 12 TiB 28.55 1.09 20 up 16 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 24 KiB 12 GiB 14 TiB 20.94 0.80 20 up 18 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 17 MiB 34 GiB 8.4 TiB 51.42 1.97 19 up 20 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.2 GiB 17 GiB 12 TiB 28.50 1.09 18 up 22 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 2.7 GiB 6.2 GiB 15 TiB 13.25 0.51 11 up 24 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 70 MiB 17 GiB 12 TiB 28.50 1.09 18 up 25 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 3.0 GiB 17 GiB 12 TiB 28.51 1.09 16 up 26 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 3.0 GiB 23 GiB 11 TiB 36.13 1.38 15 up 27 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 205 MiB 17 GiB 12 TiB 28.59 1.10 16 up 28 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 1.0 MiB 6.3 GiB 15 TiB 13.27 0.51 12 up 29 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 1.3 MiB 17 GiB 12 TiB 28.50 1.09 4 up 30 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 379 KiB 23 GiB 11 TiB 36.14 1.38 16 up 31 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 2.5 MiB 12 GiB 14 TiB 20.92 0.80 19 up 32 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 11 MiB 12 GiB 14 TiB 20.93 0.80 16 up 33 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 18 KiB 12 GiB 14 TiB 20.91 0.80 17 up 34 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 71 MiB 23 GiB 11 TiB 36.15 1.38 19 up 35 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.3 GiB 6.3 GiB 15 TiB 13.28 0.51 14 up 36 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 0 B 17 GiB 12 TiB 28.59 1.09 13 up 37 hdd 17.34140 1.00000 17 TiB 4.9 TiB 4.0 TiB 69 MiB 17 GiB 12 TiB 28.54 1.09 12 up 38 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 2.9 GiB 6.7 GiB 15 TiB 13.26 0.51 22 up 39 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 205 MiB 23 GiB 11 TiB 36.19 1.39 25 up 40 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 9 KiB 12 GiB 14 TiB 20.88 0.80 14 up 41 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 8.2 GiB 23 GiB 11 TiB 36.11 1.38 20 up 42 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 55 KiB 12 GiB 14 TiB 20.91 0.80 16 up 43 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 70 MiB 23 GiB 11 TiB 36.17 1.39 21 up 44 hdd 17.34140 1.00000 17 TiB 7.6 TiB 6.6 TiB 18 KiB 28 GiB 9.8 TiB 43.75 1.68 16 up 45 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 141 MiB 6.5 GiB 15 TiB 13.29 0.51 17 up 46 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 1.7 MiB 6.4 GiB 15 TiB 13.27 0.51 15 up 47 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 3.5 GiB 11 GiB 14 TiB 20.89 0.80 22 up 48 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 9 KiB 6.3 GiB 15 TiB 13.25 0.51 10 up 49 hdd 17.34140 1.00000 17 TiB 8.9 TiB 7.9 TiB 4 KiB 33 GiB 8.4 TiB 51.41 1.97 18 up 50 hdd 17.34140 1.00000 17 TiB 7.6 TiB 6.6 TiB 212 MiB 31 GiB 9.7 TiB 43.81 1.68 20 up 51 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.6 TiB 85 MiB 13 GiB 14 TiB 20.87 0.80 19 up 52 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 5.4 GiB 6.0 GiB 15 TiB 13.34 0.51 18 up 53 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 25 MiB 19 GiB 12 TiB 28.55 1.09 16 up 54 hdd 17.34140 1.00000 17 TiB 6.2 TiB 5.3 TiB 198 MiB 23 GiB 11 TiB 35.99 1.38 14 up 55 hdd 17.34140 1.00000 17 TiB 5.0 TiB 4.0 TiB 10 GiB 18 GiB 12 TiB 28.59 1.09 26 up 56 hdd 17.34140 1.00000 17 TiB 6.3 TiB 5.3 TiB 153 MiB 24 GiB 11 TiB 36.14 1.38 22 up 57 hdd 17.34140 1.00000 17 TiB 3.6 TiB 2.7 TiB 58 KiB 12 GiB 14 TiB 20.91 0.80 13 up 58 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 3.3 GiB 6.4 GiB 15 TiB 13.23 0.51 11 up 59 hdd 17.34140 1.00000 17 TiB 2.3 TiB 1.3 TiB 19 KiB 6.3 GiB 15 TiB 13.27 0.51 11 up TOTAL 1.0 PiB 272 TiB 213 TiB 84 GiB 942 GiB 769 TiB 26.11 root@hbgt-ceph1-mon3:/# ceph osd dump | grep pool pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 15503 lfor 0/8533/8531 flags hashpspool stripe_width 0 pg_num_min 1 application mgr,mgr_devicehealth pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8321 lfor 0/8321/8319 flags hashpspool stripe_width 0 application rgw pool 3 'bkp365-ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8297 lfor 0/8297/8295 flags hashpspool stripe_width 0 application rgw pool 4 'bkp365-ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8054 lfor 0/8054/8052 flags hashpspool stripe_width 0 application rgw pool 5 'bkp365-ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 3412 lfor 0/3412/3410 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 6 'bkp365-ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3500 lfor 0/0/2720 flags hashpspool stripe_width 12288 application rgw pool 7 'bkp365-ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 3436 lfor 0/3436/3434 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 9 'ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 14975 lfor 0/0/14973 flags hashpspool stripe_width 12288 application rgw pool 10 'ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 14979 flags hashpspool stripe_width 0 application rgw pool 11 'ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 14981 flags hashpspool stripe_width 0 application rgw pool 12 'ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15105 lfor 0/15105/15103 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 13 'ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15236 lfor 0/15236/15234 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw pool 14 'ncy.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 15241 flags hashpspool stripe_width 0 application rgw (EC32 is a erasure coding with 3 datas and 2 codes) No output with "ceph osd pool autoscale-status" Le jeu. 2 mars 2023 à 15:02, Curt <lightspd@xxxxxxxxx> a écrit : > Forgot to do a reply all. > > What does > > ceph osd df > ceph osd dump | grep pool return? > > Are you using auto scaling? 289pg with 272tb of data and 60 osds, that > seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of this > wrong. > > On Thu, Mar 2, 2023, 17:37 Joffrey <joff.au@xxxxxxxxx> wrote: > >> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are >> defaults. I tried some updates on osd-max-backfills but no change. >> I have many HDD with NVME for db and all are connected in a 25G network. >> >> Yes, it's the same PG since 4 days. >> >> I got a failure on a HDD and get many days of recovery+backfilling last 2 >> weeks. Perhaps the 'not in time' warning is related to this. >> >> 'Jof >> >> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri <aad@xxxxxxxxxxxxxx> a écrit >> : >> >> > Run `ceph health detail`. >> > >> > Is it the same PG backfilling for a long time, or a different one over >> > time? >> > >> > That it’s remapped makes me think that what you’re seeing is the >> balancer >> > doing its job. >> > >> > As far as the scrubbing, do you limit the times when scrubbing can >> happen? >> > Are these HDDs? EC? >> > >> > > On Mar 2, 2023, at 07:20, Joffrey <joff.au@xxxxxxxxx> wrote: >> > > >> > > Hi, >> > > >> > > I have many 'not {deep-}scrubbed in time' and a1 PG >> remapped+backfilling >> > > and I don't understand why this backfilling is taking so long. >> > > >> > > root@hbgt-ceph1-mon3:/# ceph -s >> > > cluster: >> > > id: c300532c-51fa-11ec-9a41-0050569c3b55 >> > > health: HEALTH_WARN >> > > 15 pgs not deep-scrubbed in time >> > > 13 pgs not scrubbed in time >> > > >> > > services: >> > > mon: 3 daemons, quorum >> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3 >> > > (age 36h) >> > > mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys: >> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm >> > > osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs >> > > rgw: 3 daemons active (3 hosts, 2 zones) >> > > >> > > data: >> > > pools: 13 pools, 289 pgs >> > > objects: 67.74M objects, 127 TiB >> > > usage: 272 TiB used, 769 TiB / 1.0 PiB avail >> > > pgs: 288 active+clean >> > > 1 active+remapped+backfilling >> > > >> > > io: >> > > client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr >> > > recovery: 790 KiB/s, 0 objects/s >> > > >> > > >> > > What can I do to understand this slow recovery (is it the backfill >> > action ?) >> > > >> > > Thanks you >> > > >> > > 'Jof >> > > _______________________________________________ >> > > ceph-users mailing list -- ceph-users@xxxxxxx >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > >> > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx