backfilling kills rbd performance

"Konold, Martin" <martin.konold@xxxxxxxxxx> · Sat, 19 Nov 2022 17:23:34 +0100

Hi,

on a 3 node hyper converged pve cluster with 12 SSD osd devices I do 
experience stalls in the rbd performance during normal backfill 
operations e.g. moving a pool from 2/1 to 3/2.

I was expecting that I could control the load caused by the backfilling 
using

ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
or
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 1'
even
ceph tell 'osd.*' config set osd_recovery_sleep_ssd 2.1
did not help.

Any hints?

Normal operation looks like:
2022-11-19T16:16:52.142355+0000 mgr.pve-02 (mgr.18134134) 60414 : 
cluster [DBG] pgmap v59642: 576 pgs: 576 active+clean; 2.4 TiB data, 4.7 
TiB used, 12 TiB / 16 TiB avail; 3.3 KiB/s rd, 2.7 MiB/s wr, 63 op/s
2022-11-19T16:16:54.144082+0000 mgr.pve-02 (mgr.18134134) 60416 : 
cluster [DBG] pgmap v59643: 576 pgs: 576 active+clean; 2.4 TiB data, 4.7 
TiB used, 12 TiB / 16 TiB avail; 2.7 KiB/s rd, 1.3 MiB/s wr, 56 op/s

I am running Ceph Quincy 17.2.5 on a test system with dedicated 
1Gbit/9000MTU storage network, while the public ceph network 
1GBit/1500MTU is shared with the vm network.

I am looking forward to you suggestions.

Regards,
ppa. Martin Konold

--
Martin Konold - Prokurist, CTO
KONSEC GmbH -⁠ make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt, Germany
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx