Qais reported [1] that iterating over all tasks when rebuilding root domains for finding out which ones are DEADLINE and need their bandwidth correctly restored on such root domains can be a costly operation (10+ ms delays on suspend-resume). He proposed we skip rebuilding root domains for certain operations, but that approach seemed arch specific and possibly prone to errors, as paths that ultimately trigger a rebuild might be quite convoluted (thanks Qais for spending time on this!). To fix the problem I instead would propose we 1 - Bring back cpuset_mutex (so that we have write access to cpusets from scheduler operations - and we also fix some problems associated to percpu_cpuset_rwsem) 2 - Keep track of the number of DEADLINE tasks belonging to each cpuset 3 - Use this information to only perform the costly iteration if DEADLINE tasks are actually present in the cpuset for which a corresponding root domain is being rebuilt This set is also available from https://github.com/jlelli/linux.git deadline/rework-cpusets Feedback is more than welcome. Best, Juri 1 - https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@xxxxxxxxxxx/ Juri Lelli (3): sched/cpuset: Bring back cpuset_mutex sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets cgroup/cpuset: Iterate only if DEADLINE tasks are present include/linux/cpuset.h | 12 ++- kernel/cgroup/cgroup.c | 4 + kernel/cgroup/cpuset.c | 175 +++++++++++++++++++++++------------------ kernel/sched/core.c | 32 ++++++-- 4 files changed, 137 insertions(+), 86 deletions(-) -- 2.39.2