Re: Very slow backfilling

Sridhar Seshasayee <sseshasa@xxxxxxxxxx> · Fri, 3 Mar 2023 22:11:32 +0530

Hello Joffrey,
I am not sure why my previous reply did not go through. I am replying on
this
thread again.

The slow backfills could also be due to the mclock_scheduler throttling the
recovery operations aggressively and this issue is currently being fixed. To
confirm if mclock_scheduler is causing this, can you please run a couple of
commands and post the output on this thread?

1. ceph versions
2. ceph config show osd.<id> | grep osd_max_backfills
3. ceph config show osd.<id> | grep osd_recovery_max_active
4. ceph config show osd.<id> | grep osd_mclock

With mclock_scheduler enabled, and with 17.2.5, it is not possible to
override
the recovery options like osd_max_backfills, osd_recovery_max_active and
other related options.

Based on the result of the above commands, I can confirm if mclock_scheduler
is causing the issue and suggest next steps.

-Sridhar

On Fri, Mar 3, 2023 at 9:35 PM Joffrey <joff.au@xxxxxxxxx> wrote:

> Ok, Thanks. You mean that the autoscale feature is... stupid ?
> I'm going to change pgp_num and use the legacy formula OSDs * 100 / pool
> size.
>
>
> Le jeu. 2 mars 2023 à 17:04, Curt <lightspd@xxxxxxxxx> a écrit :
>
> > I see autoscale_mode on all pools and I'm guessing this is your largest
> > pool bkp365-ncy.rgw.buckets.data, with 32 pg. I would definitely turn off
> > autoscale and increase pg_num/pgp_num. Someone with more experience than
> I
> > can chime in, but I would think something like 2048 would be much better.
> >
> > On Thu, Mar 2, 2023 at 6:12 PM Joffrey <joff.au@xxxxxxxxx> wrote:
> >
> >> root@hbgt-ceph1-mon3:/# ceph osd df
> >> ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE   DATA     OMAP     META
> >>    AVAIL    %USE   VAR   PGS  STATUS
> >>  1    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB   11 KiB   23
> >> GiB   11 TiB  36.17  1.39   17      up
> >>  3    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  3.7 GiB   17
> >> GiB   12 TiB  28.47  1.09   11      up
> >>  5    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  3.2 GiB   12
> >> GiB   14 TiB  20.89  0.80   13      up
> >>  7    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  3.2 GiB  6.9
> >> GiB   15 TiB  13.32  0.51   19      up
> >>  9    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   68 MiB   18
> >> GiB   12 TiB  28.53  1.09   18      up
> >> 11    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  403 MiB   23
> >> GiB   11 TiB  36.13  1.38   17      up
> >> 13    hdd  17.34140   1.00000   17 TiB  1001 GiB  7.1 GiB  9.9 MiB  1.1
> >> GiB   16 TiB   5.64  0.22   18      up
> >> 15    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB  842 KiB   34
> >> GiB  8.4 TiB  51.41  1.97   18      up
> >> 17    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   24 KiB   12
> >> GiB   14 TiB  20.90  0.80   17      up
> >> 19    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  4.1 GiB  6.2
> >> GiB   15 TiB  13.31  0.51   18      up
> >> 21    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB  206 MiB   17
> >> GiB   12 TiB  28.55  1.09   23      up
> >> 23    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  4.2 GiB   17
> >> GiB   12 TiB  28.54  1.09   14      up
> >>  0    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  7.2 GiB   12
> >> GiB   14 TiB  20.94  0.80   18      up
> >>  2    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   18 KiB   12
> >> GiB   14 TiB  20.93  0.80   13      up
> >>  4    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  3.0 GiB   12
> >> GiB   14 TiB  20.95  0.80   20      up
> >>  6    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB  4.4 MiB   34
> >> GiB  8.4 TiB  51.36  1.97   17      up
> >>  8    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  965 KiB  6.5
> >> GiB   15 TiB  13.26  0.51   14      up
> >> 10    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB   18 KiB  6.5
> >> GiB   15 TiB  13.25  0.51   13      up
> >> 12    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   98 MiB   17
> >> GiB   12 TiB  28.49  1.09   16      up
> >> 14    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB  4.2 GiB   17
> >> GiB   12 TiB  28.55  1.09   20      up
> >> 16    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   24 KiB   12
> >> GiB   14 TiB  20.94  0.80   20      up
> >> 18    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB   17 MiB   34
> >> GiB  8.4 TiB  51.42  1.97   19      up
> >> 20    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  3.2 GiB   17
> >> GiB   12 TiB  28.50  1.09   18      up
> >> 22    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  2.7 GiB  6.2
> >> GiB   15 TiB  13.25  0.51   11      up
> >> 24    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   70 MiB   17
> >> GiB   12 TiB  28.50  1.09   18      up
> >> 25    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  3.0 GiB   17
> >> GiB   12 TiB  28.51  1.09   16      up
> >> 26    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  3.0 GiB   23
> >> GiB   11 TiB  36.13  1.38   15      up
> >> 27    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB  205 MiB   17
> >> GiB   12 TiB  28.59  1.10   16      up
> >> 28    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  1.0 MiB  6.3
> >> GiB   15 TiB  13.27  0.51   12      up
> >> 29    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  1.3 MiB   17
> >> GiB   12 TiB  28.50  1.09    4      up
> >> 30    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  379 KiB   23
> >> GiB   11 TiB  36.14  1.38   16      up
> >> 31    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  2.5 MiB   12
> >> GiB   14 TiB  20.92  0.80   19      up
> >> 32    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   11 MiB   12
> >> GiB   14 TiB  20.93  0.80   16      up
> >> 33    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   18 KiB   12
> >> GiB   14 TiB  20.91  0.80   17      up
> >> 34    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB   71 MiB   23
> >> GiB   11 TiB  36.15  1.38   19      up
> >> 35    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  3.3 GiB  6.3
> >> GiB   15 TiB  13.28  0.51   14      up
> >> 36    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB      0 B   17
> >> GiB   12 TiB  28.59  1.09   13      up
> >> 37    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   69 MiB   17
> >> GiB   12 TiB  28.54  1.09   12      up
> >> 38    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  2.9 GiB  6.7
> >> GiB   15 TiB  13.26  0.51   22      up
> >> 39    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  205 MiB   23
> >> GiB   11 TiB  36.19  1.39   25      up
> >> 40    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB    9 KiB   12
> >> GiB   14 TiB  20.88  0.80   14      up
> >> 41    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  8.2 GiB   23
> >> GiB   11 TiB  36.11  1.38   20      up
> >> 42    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   55 KiB   12
> >> GiB   14 TiB  20.91  0.80   16      up
> >> 43    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB   70 MiB   23
> >> GiB   11 TiB  36.17  1.39   21      up
> >> 44    hdd  17.34140   1.00000   17 TiB   7.6 TiB  6.6 TiB   18 KiB   28
> >> GiB  9.8 TiB  43.75  1.68   16      up
> >> 45    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  141 MiB  6.5
> >> GiB   15 TiB  13.29  0.51   17      up
> >> 46    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  1.7 MiB  6.4
> >> GiB   15 TiB  13.27  0.51   15      up
> >> 47    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  3.5 GiB   11
> >> GiB   14 TiB  20.89  0.80   22      up
> >> 48    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB    9 KiB  6.3
> >> GiB   15 TiB  13.25  0.51   10      up
> >> 49    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB    4 KiB   33
> >> GiB  8.4 TiB  51.41  1.97   18      up
> >> 50    hdd  17.34140   1.00000   17 TiB   7.6 TiB  6.6 TiB  212 MiB   31
> >> GiB  9.7 TiB  43.81  1.68   20      up
> >> 51    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.6 TiB   85 MiB   13
> >> GiB   14 TiB  20.87  0.80   19      up
> >> 52    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  5.4 GiB  6.0
> >> GiB   15 TiB  13.34  0.51   18      up
> >> 53    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB   25 MiB   19
> >> GiB   12 TiB  28.55  1.09   16      up
> >> 54    hdd  17.34140   1.00000   17 TiB   6.2 TiB  5.3 TiB  198 MiB   23
> >> GiB   11 TiB  35.99  1.38   14      up
> >> 55    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB   10 GiB   18
> >> GiB   12 TiB  28.59  1.09   26      up
> >> 56    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  153 MiB   24
> >> GiB   11 TiB  36.14  1.38   22      up
> >> 57    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   58 KiB   12
> >> GiB   14 TiB  20.91  0.80   13      up
> >> 58    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  3.3 GiB  6.4
> >> GiB   15 TiB  13.23  0.51   11      up
> >> 59    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB   19 KiB  6.3
> >> GiB   15 TiB  13.27  0.51   11      up
> >>                         TOTAL  1.0 PiB   272 TiB  213 TiB   84 GiB  942
> >> GiB  769 TiB  26.11
> >>
> >>
> >> root@hbgt-ceph1-mon3:/# ceph osd dump | grep pool
> >> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash
> >> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 15503 lfor
> >> 0/8533/8531 flags hashpspool stripe_width 0 pg_num_min 1 application
> >> mgr,mgr_devicehealth
> >> pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
> >> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8321 lfor
> >> 0/8321/8319 flags hashpspool stripe_width 0 application rgw
> >> pool 3 'bkp365-ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> >> 8297 lfor 0/8297/8295 flags hashpspool stripe_width 0 application rgw
> >> pool 4 'bkp365-ncy.rgw.control' replicated size 3 min_size 2 crush_rule
> 0
> >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> >> 8054 lfor 0/8054/8052 flags hashpspool stripe_width 0 application rgw
> >> pool 5 'bkp365-ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change
> 3412
> >> lfor 0/3412/3410 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> >> pg_num_min 8 application rgw
> >> pool 6 'bkp365-ncy.rgw.buckets.data' erasure profile EC32 size 5
> min_size
> >> 4 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode
> on
> >> last_change 3500 lfor 0/0/2720 flags hashpspool stripe_width 12288
> >> application rgw
> >> pool 7 'bkp365-ncy.rgw.buckets.index' replicated size 3 min_size 2
> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on
> >> last_change 3436 lfor 0/3436/3434 flags hashpspool stripe_width 0
> >> pg_autoscale_bias 4 pg_num_min 8 application rgw
> >> pool 9 'ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4
> >> crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> >> last_change 14975 lfor 0/0/14973 flags hashpspool stripe_width 12288
> >> application rgw
> >> pool 10 'ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> >> 14979 flags hashpspool stripe_width 0 application rgw
> >> pool 11 'ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> >> 14981 flags hashpspool stripe_width 0 application rgw
> >> pool 12 'ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change
> 15105
> >> lfor 0/15105/15103 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> >> pg_num_min 8 application rgw
> >> pool 13 'ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
> 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change
> 15236
> >> lfor 0/15236/15234 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> >> pg_num_min 8 application rgw
> >> pool 14 'ncy.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule
> >> 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change
> >> 15241 flags hashpspool stripe_width 0 application rgw
> >>
> >> (EC32 is a erasure coding with 3 datas and 2 codes)
> >>
> >> No output with "ceph osd pool autoscale-status"
> >>
> >> Le jeu. 2 mars 2023 à 15:02, Curt <lightspd@xxxxxxxxx> a écrit :
> >>
> >>> Forgot to do a reply all.
> >>>
> >>> What does
> >>>
> >>> ceph osd df
> >>> ceph osd dump | grep pool return?
> >>>
> >>> Are you using auto scaling? 289pg with 272tb of data and 60 osds, that
> >>> seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of
> this
> >>> wrong.
> >>>
> >>> On Thu, Mar 2, 2023, 17:37 Joffrey <joff.au@xxxxxxxxx> wrote:
> >>>
> >>>> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are
> >>>> defaults. I tried some updates on osd-max-backfills but no change.
> >>>> I have many HDD with NVME for db and all are connected in a 25G
> network.
> >>>>
> >>>> Yes, it's the same PG since 4 days.
> >>>>
> >>>> I got a failure on a HDD and get many days of recovery+backfilling
> >>>> last  2
> >>>> weeks.   Perhaps the 'not in time' warning is related to this.
> >>>>
> >>>> 'Jof
> >>>>
> >>>> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri <aad@xxxxxxxxxxxxxx> a
> >>>> écrit :
> >>>>
> >>>> > Run `ceph health detail`.
> >>>> >
> >>>> > Is it the same PG backfilling for a long time, or a different one
> over
> >>>> > time?
> >>>> >
> >>>> > That it’s remapped makes me think that what you’re seeing is the
> >>>> balancer
> >>>> > doing its job.
> >>>> >
> >>>> > As far as the scrubbing, do you limit the times when scrubbing can
> >>>> happen?
> >>>> > Are these HDDs? EC?
> >>>> >
> >>>> > > On Mar 2, 2023, at 07:20, Joffrey <joff.au@xxxxxxxxx> wrote:
> >>>> > >
> >>>> > > Hi,
> >>>> > >
> >>>> > > I have many 'not {deep-}scrubbed in time' and a1 PG
> >>>> remapped+backfilling
> >>>> > > and I don't understand why this backfilling is taking so long.
> >>>> > >
> >>>> > > root@hbgt-ceph1-mon3:/# ceph -s
> >>>> > >  cluster:
> >>>> > >    id:     c300532c-51fa-11ec-9a41-0050569c3b55
> >>>> > >    health: HEALTH_WARN
> >>>> > >            15 pgs not deep-scrubbed in time
> >>>> > >            13 pgs not scrubbed in time
> >>>> > >
> >>>> > >  services:
> >>>> > >    mon: 3 daemons, quorum
> >>>> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
> >>>> > > (age 36h)
> >>>> > >    mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys:
> >>>> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm
> >>>> > >    osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped
> >>>> pgs
> >>>> > >    rgw: 3 daemons active (3 hosts, 2 zones)
> >>>> > >
> >>>> > >  data:
> >>>> > >    pools:   13 pools, 289 pgs
> >>>> > >    objects: 67.74M objects, 127 TiB
> >>>> > >    usage:   272 TiB used, 769 TiB / 1.0 PiB avail
> >>>> > >    pgs:     288 active+clean
> >>>> > >             1   active+remapped+backfilling
> >>>> > >
> >>>> > >  io:
> >>>> > >    client:   3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr
> >>>> > >    recovery: 790 KiB/s, 0 objects/s
> >>>> > >
> >>>> > >
> >>>> > > What can I do to understand this slow recovery (is it the backfill
> >>>> > action ?)
> >>>> > >
> >>>> > > Thanks you
> >>>> > >
> >>>> > > 'Jof
> >>>> > > _______________________________________________
> >>>> > > ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>> >
> >>>> >
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>
> >>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 

Sridhar Seshasayee

Partner Engineer

Red Hat <https://www.redhat.com>
<https://www.redhat.com>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx