Hi Martin, In Quincy the osd_op_queue defaults to 'mclock_scheduler'. It was set to 'wpq' before Quincy. > on a 3 node hyper converged pve cluster with 12 SSD osd devices I do > experience stalls in the rbd performance during normal backfill > operations e.g. moving a pool from 2/1 to 3/2. > > I was expecting that I could control the load caused by the backfilling > using > > ceph tell 'osd.*' injectargs '--osd-max-backfills 1' > or > ceph tell 'osd.*' injectargs '--osd-recovery-max-active 1' > even > ceph tell 'osd.*' config set osd_recovery_sleep_ssd 2.1 > did not help. > > Any hints? > Due to the way mclock scheduler works, all the sleep options along with backfill and recovery limits cannot be modified. This is documented here: https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#mclock-built-in-profiles I am running Ceph Quincy 17.2.5 on a test system with dedicated > 1Gbit/9000MTU storage network, while the public ceph network > 1GBit/1500MTU is shared with the vm network. > > I am looking forward to you suggestions. > > The following optimizations are slated to be merged that modify the above behavior especially with backfill/recovery that you are observing: 1. Reduce the current high limit set for backfill/recovery operations that could overwhelm client operations in some situations. 2. Allow users to modify the backfill/recovery limits if required using another gating option. 3. Optimize the mclock profiles so that client and recovery operations get the desired IOPS allocations. Until the next upcoming Quincy release, to avoid the backfill/recovery issue, you can switch to the 'wpq' scheduler by setting osd_op_queue = wpq and restarting the osds. -Sridhar _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx