Do you think maybe you should issue an immediate change/patch/update to quincy to change the default to wpq? Given the cluster ending nature of the problem? On Wed, Jul 20, 2022 at 4:01 AM Sridhar Seshasayee <sseshasa@xxxxxxxxxx> wrote: > Hi Daniel, > > > And further to my theory about the spin lock or similar, increasing my >> recovery by 4-16x using wpq sees my cpu rise to 10-15% ( from 3% )... >> but using mclock, even at very very conservative recovery settings sees a >> median CPU usage of some multiple of 100% (eg. a multiple of a machine >> core/thread usage per osd). >> >> > The issue has been narrowed down to waiting threads of a work queue shard > being unblocked prematurely during the wait period prior to dequeuing a > future > work item from the mclock queue. This is leading to high CPU usage as you > have observed. WPQ uses sleep to throttle various operations, but mclock > based on the set QoS parameters could schedule operations in the future. > The > work queue threads should ideally relinquish CPU and block until the set > time > duration, but this is evidently not happening currently. While the fix is > being > worked upon, do continue to provide us your feedback and use wpq in the > interim. Thanks! > > -Sridhar > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx