Hey Frank, Frank Schilder <frans@xxxxxx> writes: > The recovery_sleep options are the next choice to look at. Increase it and clients will get more I/O time slots. However, with your settings, I'm surprised clients are impacted at all. I usually leave the op-priority at its default and use osd-max-backfill=2..4 for HDDs. With this, clients usually don't notice anything. I'm running mimic 13.2.10 though. Wow, that is impressive and sounds opposite of what we see around here. Often rebalances directly and strongly impact client I/O. I wonder if this is related to any inherited settings? This cluster used to be kraken based and we usually followed the ceph upgrade guide, but maybe some tunables are incorrect and influence the client i/o speed? I'll note recovery_sleep for the next rebalance and see how it changes the client I/O. Something funky happened during this rebalance: - the trigger was dead osd removal - osds that had ben out for days - when triggering osd rm & crush remove, there were ~30 PGs marked as degraded - It seems after the degraded ones have been fixed, the I/O was more back to normal I am though very strongly confused why PGs would actually turn to degraded, as the failure of the OSDs had been already corrected before. Is this a bug or expected behaviour? Cheers, Nico -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx