Wow, thanks everyone for the amazing pointers - I've not seen the osd op queue settings so far! At the moment the cluster is configured with -------------------------------------------------------------------------------- [18:49:55] server4.place6:~# ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced auth_client_required cephx * global advanced auth_cluster_required cephx * global advanced auth_service_required cephx * global advanced cluster_network 2a0a:e5c0:2:1::/64 * global advanced ms_bind_ipv4 false global advanced ms_bind_ipv6 true global advanced osd_class_update_on_start false global advanced osd_pool_default_size 3 global advanced public_network 2a0a:e5c0:2:1::/64 * mgr advanced mgr/balancer/active 1 mgr unknown mgr/balancer/max_misplaced .01 * mgr advanced mgr/balancer/mode upmap mgr advanced mgr/prometheus/rbd_stats_pools hdd,ssd,xruk-ssd-pool * osd advanced osd_max_backfills 1 osd advanced osd_recovery_max_active 1 osd advanced osd_recovery_op_priority 1 -------------------------------------------------------------------------------- And loooking at a random osd: [18:54:04] server4.place6:~# ceph daemon osd.7 config show | grep osd | grep -e 'osd_op_queue"' -e osd_op_queue_cut_off "osd_op_queue": "wpq", "osd_op_queue_cut_off": "low", This might finally explain it! Thanks a lot everyone for the pointer! Today, going to stop working happily! Cheers, Nico Peter Lieven <pl@xxxxxxx> writes: > Have you tried setting > > osd op queue cut off to high? > > Peter > > >> Am 11.08.2021 um 15:24 schrieb Frank Schilder <frans@xxxxxx>: >> >> The recovery_sleep options are the next choice to look at. Increase it and clients will get more I/O time slots. However, with your settings, I'm surprised clients are impacted at all. I usually leave the op-priority at its default and use osd-max-backfill=2..4 for HDDs. With this, clients usually don't notice anything. I'm running mimic 13.2.10 though. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx> >> Sent: 11 August 2021 10:08:34 >> To: Ceph Users >> Subject: Very slow I/O during rebalance - options to tune? >> >> Good morning, >> >> after removing 3 osds which had been dead for some time, >> rebalancing started this morning and makes client I/O really slow (in >> the 10~30 MB/s area!). Rebalancing started at 1.2 ~ 1.6 Gb/s >> after issuing >> >> ceph tell 'osd.*' injectargs --osd-max-backfills=1 --osd-recovery-max-active=1 --osd-recovery-op-priority=1 >> >> the rebalance came down ~800MB/s. >> >> The default osd-recovery-op-priority is 2 in our clusters, so way below >> the client priority. >> >> In some moments (see ceph -s output below) some particular osds are >> shown as slow, but that is not consistent to one host or one osd, but >> seems to go through the cluster. >> >> Are there any other ways to prioritize client traffic over rebalancing? >> >> We don't want to stop the rebalance completely, but it seems that even >> with above settings to sacrifice the client I/O completely. >> >> Our cluster version is 14.2.16. >> >> Best regards, >> >> Nico >> >> -------------------------------------------------------------------------------- >> >> cluster: >> id: 1ccd84f6-e362-4c50-9ffe-59436745e445 >> health: HEALTH_WARN >> 5 slow ops, oldest one blocked for 155 sec, daemons [osd.12,osd.41] have slow ops. >> >> services: >> mon: 5 daemons, quorum server2,server8,server6,server4,server18 (age 4w) >> mgr: server2(active, since 4w), standbys: server4, server6, server8, server18 >> osd: 104 osds: 104 up (since 46m), 104 in (since 4w); 365 remapped pgs >> >> data: >> pools: 4 pools, 2624 pgs >> objects: 47.67M objects, 181 TiB >> usage: 550 TiB used, 215 TiB / 765 TiB avail >> pgs: 6034480/142997898 objects misplaced (4.220%) >> 2259 active+clean >> 315 active+remapped+backfill_wait >> 50 active+remapped+backfilling >> >> io: >> client: 15 MiB/s rd, 26 MiB/s wr, 559 op/s rd, 617 op/s wr >> recovery: 782 MiB/s, 196 objects/s >> >> ... and a little later: >> >> [10:06:32] server6.place6:~# ceph -s >> cluster: >> id: 1ccd84f6-e362-4c50-9ffe-59436745e445 >> health: HEALTH_OK >> >> services: >> mon: 5 daemons, quorum server2,server8,server6,server4,server18 (age 4w) >> mgr: server2(active, since 4w), standbys: server4, server6, server8, server18 >> osd: 104 osds: 104 up (since 59m), 104 in (since 4w); 349 remapped pgs >> >> data: >> pools: 4 pools, 2624 pgs >> objects: 47.67M objects, 181 TiB >> usage: 550 TiB used, 214 TiB / 765 TiB avail >> pgs: 5876676/143004876 objects misplaced (4.109%) >> 2275 active+clean >> 303 active+remapped+backfill_wait >> 46 active+remapped+backfilling >> >> io: >> client: 3.6 MiB/s rd, 25 MiB/s wr, 704 op/s rd, 726 op/s wr >> recovery: 776 MiB/s, 0 keys/s, 195 objects/s >> >> >> -- >> Sustainable and modern Infrastructures by ungleich.ch >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx