Hello Paul, On Fri, Jul 7, 2023 at 5:13 PM Paul Mezzanini <pfmeec@xxxxxxx> wrote: > I recently got mclock going literally an order of magnitude faster. I > would love to claim I found all the options myself but I collected the > knowledge of what knobs I needed to turn from here. > Significant usability and design improvements have been made to the mclock scheduler in the upstream Reef release. These improvements should soon be available in Quincy as well. One of the major goals is to reduce the number of knobs to tune and achieve a more hands free operation. This was partly achieved in the existing releases to some extent by eliminating the need to tune sleep and operation specific cost options. Here are some of the major improvements (from Reef release notes) that should help: 1. The balanced profile is set as the default mClock profile because it represents a compromise between prioritizing client IO or recovery IO. Users can then choose either the high_client_ops profile to prioritize client IO or the high_recovery_ops profile to prioritize recovery IO. 2. QoS parameters like reservation and limit are now specified in terms of a fraction (range: 0.0 to 1.0) of the OSD’s IOPS capacity. 3. The cost parameters - osd_mclock_cost_per_io_usec_* and osd_mclock_cost_per_byte_usec_* have been removed. The cost of an operation is now determined internally using the random IOPS and maximum sequential bandwidth capability of the OSD’s underlying device. 4. The random IOPS capacity is determined using 'osd bench' as before, but now based on the result, unrealistic values are not considered and reasonable defaults are used if the measurement crosses a threshold governed by osd_mclock_iops_capacity_threshold_[hdd|ssd]. The default IOPS capacity can still be overridden by users if not accurate, The thresholds too are configurable. The max sequential bandwidth is defined by osd_mclock_max_sequential_bandwidth_[hdd|ssd], and are set to reasonable defaults. Again, these may be modified if not accurate. Therefore, these changes account for inaccuracies and provide good control to the user in terms of specifying accurate OSD characteristics. 5. Degraded object recovery is given higher priority when compared to misplaced object recovery because degraded objects present a data safety issue not present with objects that are merely misplaced. Therefore, backfilling operations with the balanced and high_client_ops mClock profiles may progress slower than what was seen with the WeightedPriorityQueue (WPQ) scheduler. For faster recovery and backfills, the 'high_recovery_ops' profile with modified QoS parameters would help. 6. The QoS allocations in all the mClock profiles are optimized based on the above fixes and enhancements. Please see the latest upstream documentation for more details: https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/ The recommendation is to upgrade when feasible and provide your feedback, questions and suggestions. -Sridhar _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx