> Wow, that is impressive and sounds opposite of what we see around > here. Often rebalances directly and strongly impact client I/O. It might be the missing settings: osd_op_queue = wpq osd_op_queue_cut_off = high If the cluster comes from kraken, these might be inherited with different values. Set these on "global", its more than just the OSDs using these settings. > I am though very strongly confused why PGs would actually turn to > degraded, as the failure of the OSDs had been already corrected before. This is expected behaviour. The OSDs are still present in the crush map and used for placement calculations. The idea is to reduce data movement during disk replacement between other (healthy) OSDs. The procedure is to let the cluster heal and use osd destroy to maintain the IDs: Monitor commands: ================= osd destroy <osdname (id|osd.id)> {-- mark osd as being destroyed. Keeps the yes-i-really-mean-it} ID intact (allowing reuse), but removes cephx keys, config-key data and lockbox keys, rendering data permanently unreadable. Then you can deploy new OSDs on the same hosts with these IDs and data will move back with minimal movement between other OSDs again. The manual deployment commands accept OSD IDs as an optional argument for this reason. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx> Sent: 12 August 2021 15:56:49 To: Frank Schilder Cc: Nico Schottelius; Ceph Users Subject: Re: Very slow I/O during rebalance - options to tune? Hey Frank, Frank Schilder <frans@xxxxxx> writes: > The recovery_sleep options are the next choice to look at. Increase it and clients will get more I/O time slots. However, with your settings, I'm surprised clients are impacted at all. I usually leave the op-priority at its default and use osd-max-backfill=2..4 for HDDs. With this, clients usually don't notice anything. I'm running mimic 13.2.10 though. Wow, that is impressive and sounds opposite of what we see around here. Often rebalances directly and strongly impact client I/O. I wonder if this is related to any inherited settings? This cluster used to be kraken based and we usually followed the ceph upgrade guide, but maybe some tunables are incorrect and influence the client i/o speed? I'll note recovery_sleep for the next rebalance and see how it changes the client I/O. Something funky happened during this rebalance: - the trigger was dead osd removal - osds that had ben out for days - when triggering osd rm & crush remove, there were ~30 PGs marked as degraded - It seems after the degraded ones have been fixed, the I/O was more back to normal I am though very strongly confused why PGs would actually turn to degraded, as the failure of the OSDs had been already corrected before. Is this a bug or expected behaviour? Cheers, Nico -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx