Re: Very slow I/O during rebalance - options to tune?

Peter Lieven <pl@xxxxxxx> · Wed, 11 Aug 2021 15:34:30 +0200

Have you tried setting 

osd op queue cut off to high?

Peter

> Am 11.08.2021 um 15:24 schrieb Frank Schilder <frans@xxxxxx>:
> 
> The recovery_sleep options are the next choice to look at. Increase it and clients will get more I/O time slots. However, with your settings, I'm surprised clients are impacted at all. I usually leave the op-priority at its default and use osd-max-backfill=2..4 for HDDs. With this, clients usually don't notice anything. I'm running mimic 13.2.10 though.
> 
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> ________________________________________
> From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx>
> Sent: 11 August 2021 10:08:34
> To: Ceph Users
> Subject:  Very slow I/O during rebalance - options to tune?
> 
> Good morning,
> 
> after removing 3 osds which had been dead for some time,
> rebalancing started this morning and makes client I/O really slow (in
> the 10~30 MB/s area!). Rebalancing started at 1.2 ~ 1.6 Gb/s
> after issuing
> 
>    ceph tell 'osd.*' injectargs --osd-max-backfills=1 --osd-recovery-max-active=1 --osd-recovery-op-priority=1
> 
> the rebalance came down ~800MB/s.
> 
> The default osd-recovery-op-priority is 2 in our clusters, so way below
> the client priority.
> 
> In some moments (see ceph -s output below) some particular osds are
> shown as slow, but that is not consistent to one host or one osd, but
> seems to go through the cluster.
> 
> Are there any other ways to prioritize client traffic over rebalancing?
> 
> We don't want to stop the rebalance completely, but it seems that even
> with above settings to sacrifice the client I/O completely.
> 
> Our cluster version is 14.2.16.
> 
> Best regards,
> 
> Nico
> 
> --------------------------------------------------------------------------------
> 
>  cluster:
>    id:     1ccd84f6-e362-4c50-9ffe-59436745e445
>    health: HEALTH_WARN
>            5 slow ops, oldest one blocked for 155 sec, daemons [osd.12,osd.41] have slow ops.
> 
>  services:
>    mon: 5 daemons, quorum server2,server8,server6,server4,server18 (age 4w)
>    mgr: server2(active, since 4w), standbys: server4, server6, server8, server18
>    osd: 104 osds: 104 up (since 46m), 104 in (since 4w); 365 remapped pgs
> 
>  data:
>    pools:   4 pools, 2624 pgs
>    objects: 47.67M objects, 181 TiB
>    usage:   550 TiB used, 215 TiB / 765 TiB avail
>    pgs:     6034480/142997898 objects misplaced (4.220%)
>             2259 active+clean
>             315  active+remapped+backfill_wait
>             50   active+remapped+backfilling
> 
>  io:
>    client:   15 MiB/s rd, 26 MiB/s wr, 559 op/s rd, 617 op/s wr
>    recovery: 782 MiB/s, 196 objects/s
> 
> ... and a little later:
> 
> [10:06:32] server6.place6:~# ceph -s
>  cluster:
>    id:     1ccd84f6-e362-4c50-9ffe-59436745e445
>    health: HEALTH_OK
> 
>  services:
>    mon: 5 daemons, quorum server2,server8,server6,server4,server18 (age 4w)
>    mgr: server2(active, since 4w), standbys: server4, server6, server8, server18
>    osd: 104 osds: 104 up (since 59m), 104 in (since 4w); 349 remapped pgs
> 
>  data:
>    pools:   4 pools, 2624 pgs
>    objects: 47.67M objects, 181 TiB
>    usage:   550 TiB used, 214 TiB / 765 TiB avail
>    pgs:     5876676/143004876 objects misplaced (4.109%)
>             2275 active+clean
>             303  active+remapped+backfill_wait
>             46   active+remapped+backfilling
> 
>  io:
>    client:   3.6 MiB/s rd, 25 MiB/s wr, 704 op/s rd, 726 op/s wr
>    recovery: 776 MiB/s, 0 keys/s, 195 objects/s
> 
> 
> --
> Sustainable and modern Infrastructures by ungleich.ch
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx