Re: Cannot get backfill speed up

Dan van der Ster <dan.vanderster@xxxxxxxxx> · Thu, 6 Jul 2023 15:04:59 -0700

Hi Jesper,

Indeed many users reported slow backfilling and recovery with the mclock
scheduler. This is supposed to be fixed in the latest quincy but clearly
something is still slowing things down.
Some clusters have better luck reverting to osd_op_queue = wpq.

(I'm hoping by proposing this someone who tuned mclock recently will chime
in with better advice).

Cheers, Dan

______________________________________________________
Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com

On Wed, Jul 5, 2023 at 10:28 PM Jesper Krogh <jesper@xxxxxxxx> wrote:

>
> Hi.
>
> Fresh cluster - but despite setting:
> jskr@dkcphhpcmgt028:/$ sudo ceph config show osd.0 |  grep
> recovery_max_active_ssd
> osd_recovery_max_active_ssd                      50
>
>                                                        mon
> default[20]
> jskr@dkcphhpcmgt028:/$ sudo ceph config show osd.0 |  grep
> osd_max_backfills
> osd_max_backfills                                100
>
>                                                        mon
> default[10]
>
> I still get
> jskr@dkcphhpcmgt028:/$ sudo ceph status
>    cluster:
>      id:     5c384430-da91-11ed-af9c-c780a5227aff
>      health: HEALTH_OK
>
>    services:
>      mon: 3 daemons, quorum dkcphhpcmgt031,dkcphhpcmgt029,dkcphhpcmgt028
> (age 16h)
>      mgr: dkcphhpcmgt031.afbgjx(active, since 33h), standbys:
> dkcphhpcmgt029.bnsegi, dkcphhpcmgt028.bxxkqd
>      mds: 2/2 daemons up, 1 standby
>      osd: 40 osds: 40 up (since 45h), 40 in (since 39h); 21 remapped pgs
>
>    data:
>      volumes: 2/2 healthy
>      pools:   9 pools, 495 pgs
>      objects: 24.85M objects, 60 TiB
>      usage:   117 TiB used, 159 TiB / 276 TiB avail
>      pgs:     10655690/145764002 objects misplaced (7.310%)
>               474 active+clean
>               15  active+remapped+backfilling
>               6   active+remapped+backfill_wait
>
>    io:
>      client:   0 B/s rd, 1.4 MiB/s wr, 0 op/s rd, 116 op/s wr
>      recovery: 328 MiB/s, 108 objects/s
>
>    progress:
>      Global Recovery Event (9h)
>        [==========================..] (remaining: 25m)
>
> With these numbers for the setting - I would expect to get more than 15
> active backfilling... (and based on SSD's and 2x25gbit network, I can
> also spend more resources on recovery than 328 MiB/s
>
> Thanks, .
>
> --
> Jesper Krogh
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx