Le Wed, Aug 21, 2024 at 10:23:11AM -0400, Waiman Long a écrit : > The housekeeping CPU masks, set up by the "isolcpus" and "nohz_full" > boot command line options, are used at boot time to exclude selected CPUs > from running some kernel background processes to minimize disturbance > to latency sensitive userspace applications. Some of housekeeping CPU > masks are also checked at run time to avoid using those isolated CPUs. > > The cpuset subsystem is now able to dynamically create a set of isolated > CPUs to be used in isolated cpuset partitions. The long term goal is > to make the degree of isolation as close as possible to what can be > done statically using those boot command line options. > > This patch is a step in that direction by making the housekeeping CPU > mask APIs exclude the dynamically isolated CPUs when they are called > at run time. The housekeeping CPU masks will fall back to the bootup > default when all the dynamically isolated CPUs are released. > > A new housekeeping_exlude_isolcpus() function is added which is to be > called by the cpuset subsystem to provide a list of isolated CPUs to > be excluded. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> It's a bit hard to review this for several reasons: * first, because I'm doing it three months late, sorry about that * We need to get the HK_TYPE_KERNEL_NOISE patchset in because the gazillions types don't help. Let's ping again scheduler people once -rc1 is released. I'm setting an alarm! * It's hard to forecast what kind of synchronization will be needed against housekeeping cpumask updates. I need to audit all the users. But since all target CPUs are offline, there are just a few things left to consider. One of them is kthreads affinity and that should be at least partially solved by the kthread affinity patchset (https://lore.kernel.org/lkml/20241112142248.20503-1-frederic@xxxxxxxxxx/) Hopefully I'll manage to get that in for the upcoming merge window. Some more thoughts: > --- > include/linux/sched/isolation.h | 8 +++ > kernel/sched/isolation.c | 112 +++++++++++++++++++++++++++++++- > 2 files changed, 119 insertions(+), 1 deletion(-) > > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h > index 2b461129d1fa..d64fa4e60138 100644 > --- a/include/linux/sched/isolation.h > +++ b/include/linux/sched/isolation.h > @@ -27,6 +27,8 @@ extern bool housekeeping_enabled(enum hk_type type); > extern void housekeeping_affine(struct task_struct *t, enum hk_type type); > extern bool housekeeping_test_cpu(int cpu, enum hk_type type); > extern void __init housekeeping_init(void); > +extern int housekeeping_exlude_isolcpus(const struct cpumask *isolcpus, > + unsigned long flags); > > #else > > @@ -54,6 +56,12 @@ static inline bool housekeeping_test_cpu(int cpu, enum hk_type type) > } > > static inline void housekeeping_init(void) { } > + > +static inline int housekeeping_exlude_isolcpus(struct cpumask *isolcpus, > + unsigned long flags) > +{ > + return -EOPNOTSUPP; > +} > #endif /* CONFIG_CPU_ISOLATION */ > > static inline bool housekeeping_cpu(int cpu, enum hk_type type) > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > index 5891e715f00d..3018ba81eb65 100644 > --- a/kernel/sched/isolation.c > +++ b/kernel/sched/isolation.c > @@ -28,7 +28,16 @@ struct housekeeping { > unsigned long flags; > }; > > -static struct housekeeping housekeeping; > +static struct housekeeping housekeeping __read_mostly; > + > +/* > + * Boot time housekeeping cpumask and flags > + * > + * If more than one of nohz_full or isolcpus are specified, the cpumask must > + * be the same or the setup will fail. > + */ > +static cpumask_var_t boot_hk_cpumask; > +static unsigned long boot_hk_flags; > > bool housekeeping_enabled(enum hk_type type) > { > @@ -253,3 +262,104 @@ static int __init housekeeping_isolcpus_setup(char *str) > return housekeeping_setup(str, flags); > } > __setup("isolcpus=", housekeeping_isolcpus_setup); > + > +/* > + * Save bootup housekeeping cpumask and flags > + */ > +static int housekeeping_save(void) > +{ > + enum hk_type type; > + > + boot_hk_flags = housekeeping.flags; > + for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) { > + if (!alloc_cpumask_var(&boot_hk_cpumask, GFP_KERNEL)) > + return -ENOMEM; So this leaks and overwrites the mask for each flags? Also only HK_TYPE_KERNEL_NOISE will be interesting. > + cpumask_copy(boot_hk_cpumask, housekeeping.cpumasks[type]); > + break; > + } > + return 0; > +} Should it be done on boot when housekeeping is allocated? Thanks.