Re: Very slow I/O during rebalance - options to tune?

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Thu, 12 Aug 2021 18:55:52 +0200

Wow, thanks everyone for the amazing pointers -
I've not seen the osd op queue settings so far!

At the moment the cluster is configured with

--------------------------------------------------------------------------------
[18:49:55] server4.place6:~# ceph config dump
WHO    MASK LEVEL    OPTION                         VALUE                 RO
global      advanced auth_client_required           cephx                 *
global      advanced auth_cluster_required          cephx                 *
global      advanced auth_service_required          cephx                 *
global      advanced cluster_network                2a0a:e5c0:2:1::/64    *
global      advanced ms_bind_ipv4                   false
global      advanced ms_bind_ipv6                   true
global      advanced osd_class_update_on_start      false
global      advanced osd_pool_default_size          3
global      advanced public_network                 2a0a:e5c0:2:1::/64    *
  mgr       advanced mgr/balancer/active            1
  mgr       unknown  mgr/balancer/max_misplaced     .01                   *
  mgr       advanced mgr/balancer/mode              upmap
  mgr       advanced mgr/prometheus/rbd_stats_pools hdd,ssd,xruk-ssd-pool *
  osd       advanced osd_max_backfills              1
  osd       advanced osd_recovery_max_active        1
  osd       advanced osd_recovery_op_priority       1
--------------------------------------------------------------------------------

And loooking at a random osd:

[18:54:04] server4.place6:~# ceph daemon osd.7 config show | grep osd | grep -e 'osd_op_queue"' -e osd_op_queue_cut_off
    "osd_op_queue": "wpq",
    "osd_op_queue_cut_off": "low",

This might finally explain it! Thanks a lot everyone for the pointer!

Today, going to stop working happily!

Cheers,

Nico

Peter Lieven <pl@xxxxxxx> writes:

> Have you tried setting
>
> osd op queue cut off to high?
>
> Peter
>
>
>> Am 11.08.2021 um 15:24 schrieb Frank Schilder <frans@xxxxxx>:
>>
>> The recovery_sleep options are the next choice to look at. Increase it and clients will get more I/O time slots. However, with your settings, I'm surprised clients are impacted at all. I usually leave the op-priority at its default and use osd-max-backfill=2..4 for HDDs. With this, clients usually don't notice anything. I'm running mimic 13.2.10 though.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx>
>> Sent: 11 August 2021 10:08:34
>> To: Ceph Users
>> Subject:  Very slow I/O during rebalance - options to tune?
>>
>> Good morning,
>>
>> after removing 3 osds which had been dead for some time,
>> rebalancing started this morning and makes client I/O really slow (in
>> the 10~30 MB/s area!). Rebalancing started at 1.2 ~ 1.6 Gb/s
>> after issuing
>>
>>    ceph tell 'osd.*' injectargs --osd-max-backfills=1 --osd-recovery-max-active=1 --osd-recovery-op-priority=1
>>
>> the rebalance came down ~800MB/s.
>>
>> The default osd-recovery-op-priority is 2 in our clusters, so way below
>> the client priority.
>>
>> In some moments (see ceph -s output below) some particular osds are
>> shown as slow, but that is not consistent to one host or one osd, but
>> seems to go through the cluster.
>>
>> Are there any other ways to prioritize client traffic over rebalancing?
>>
>> We don't want to stop the rebalance completely, but it seems that even
>> with above settings to sacrifice the client I/O completely.
>>
>> Our cluster version is 14.2.16.
>>
>> Best regards,
>>
>> Nico
>>
>> --------------------------------------------------------------------------------
>>
>>  cluster:
>>    id:     1ccd84f6-e362-4c50-9ffe-59436745e445
>>    health: HEALTH_WARN
>>            5 slow ops, oldest one blocked for 155 sec, daemons [osd.12,osd.41] have slow ops.
>>
>>  services:
>>    mon: 5 daemons, quorum server2,server8,server6,server4,server18 (age 4w)
>>    mgr: server2(active, since 4w), standbys: server4, server6, server8, server18
>>    osd: 104 osds: 104 up (since 46m), 104 in (since 4w); 365 remapped pgs
>>
>>  data:
>>    pools:   4 pools, 2624 pgs
>>    objects: 47.67M objects, 181 TiB
>>    usage:   550 TiB used, 215 TiB / 765 TiB avail
>>    pgs:     6034480/142997898 objects misplaced (4.220%)
>>             2259 active+clean
>>             315  active+remapped+backfill_wait
>>             50   active+remapped+backfilling
>>
>>  io:
>>    client:   15 MiB/s rd, 26 MiB/s wr, 559 op/s rd, 617 op/s wr
>>    recovery: 782 MiB/s, 196 objects/s
>>
>> ... and a little later:
>>
>> [10:06:32] server6.place6:~# ceph -s
>>  cluster:
>>    id:     1ccd84f6-e362-4c50-9ffe-59436745e445
>>    health: HEALTH_OK
>>
>>  services:
>>    mon: 5 daemons, quorum server2,server8,server6,server4,server18 (age 4w)
>>    mgr: server2(active, since 4w), standbys: server4, server6, server8, server18
>>    osd: 104 osds: 104 up (since 59m), 104 in (since 4w); 349 remapped pgs
>>
>>  data:
>>    pools:   4 pools, 2624 pgs
>>    objects: 47.67M objects, 181 TiB
>>    usage:   550 TiB used, 214 TiB / 765 TiB avail
>>    pgs:     5876676/143004876 objects misplaced (4.109%)
>>             2275 active+clean
>>             303  active+remapped+backfill_wait
>>             46   active+remapped+backfilling
>>
>>  io:
>>    client:   3.6 MiB/s rd, 25 MiB/s wr, 704 op/s rd, 726 op/s wr
>>    recovery: 776 MiB/s, 0 keys/s, 195 objects/s
>>
>>
>> --
>> Sustainable and modern Infrastructures by ungleich.ch
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx