On Mon, Dec 23, 2024 at 03:39:56PM -0800, Yury Norov wrote: > On Fri, Dec 20, 2024 at 04:11:41PM +0100, Andrea Righi wrote: > > With the introduction of separate per-NUMA node cpumasks, we > > automatically track idle CPUs within each NUMA node. > > > > This makes the special logic for determining idle CPUs in each NUMA node > > redundant and unnecessary, so we can get rid of it. > > But it looks like you do more than that... > > > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx> > > --- > > kernel/sched/ext_idle.c | 93 ++++++++++------------------------------- > > 1 file changed, 23 insertions(+), 70 deletions(-) > > > > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c > > index 013deaa08f12..b36e93da1b75 100644 > > --- a/kernel/sched/ext_idle.c > > +++ b/kernel/sched/ext_idle.c > > @@ -82,7 +82,6 @@ static void idle_masks_init(void) > > } > > > > static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc); > > -static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_numa); > > > > /* > > * Return the node id associated to a target idle CPU (used to determine > > @@ -259,25 +258,6 @@ static unsigned int numa_weight(s32 cpu) > > return sg->group_weight; > > } > > > > -/* > > - * Return the cpumask representing the NUMA domain of @cpu (or NULL if the NUMA > > - * domain is not defined). > > - */ > > -static struct cpumask *numa_span(s32 cpu) > > -{ > > - struct sched_domain *sd; > > - struct sched_group *sg; > > - > > - sd = rcu_dereference(per_cpu(sd_numa, cpu)); > > - if (!sd) > > - return NULL; > > - sg = sd->groups; > > - if (!sg) > > - return NULL; > > - > > - return sched_group_span(sg); > > I didn't find llc_span() and node_span() in vanilla kernel. Does this series > have prerequisites? This patch set is based on the sched_ext/for-6.14 branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/ I put sched_ext/for-6.14 in the cover email, maybe it wasn't very clear. I should have mentioned the git repo in the email. > > > -} > > - > > /* > > * Return true if the LLC domains do not perfectly overlap with the NUMA > > * domains, false otherwise. > > @@ -329,7 +309,7 @@ static bool llc_numa_mismatch(void) > > */ > > static void update_selcpu_topology(struct sched_ext_ops *ops) > > { > > - bool enable_llc = false, enable_numa = false; > > + bool enable_llc = false; > > unsigned int nr_cpus; > > s32 cpu = cpumask_first(cpu_online_mask); > > > > @@ -348,41 +328,34 @@ static void update_selcpu_topology(struct sched_ext_ops *ops) > > if (nr_cpus > 0) { > > if (nr_cpus < num_online_cpus()) > > enable_llc = true; > > + /* > > + * No need to enable LLC optimization if the LLC domains are > > + * perfectly overlapping with the NUMA domains when per-node > > + * cpumasks are enabled. > > + */ > > + if ((ops->flags & SCX_OPS_BUILTIN_IDLE_PER_NODE) && > > + !llc_numa_mismatch()) > > + enable_llc = false; > > This doesn't sound like redundancy removal. I may be wrong, but this > looks like a sort of optimization. If so, it deserves to be a separate > patch. So, the initial idea was to replace the current NUMA awareness logic with the per-node cpumasks. But in fact, we're doing this change: - before: - NUMA-awareness logic implicitly enabled if the node CPUs don't overlap with LLC CPUs (as it would be redundant) - after : - NUMA-awareness logic explicitly enabled when the scx scheduler sets SCX_OPS_BUILTIN_IDLE_PER_NODE in .flags (and in this case implicitly disable LLC awareness if the node/llc CPUs are overlapping) Maybe a better approach would be to keep the old NUMA/LLC logic exactly as it is in sched_ext/for-6.14 if SCX_OPS_BUILTIN_IDLE_PER_NODE is not specified, otherwise use the new logic (and implicitly disable scx_selcpu_topo_numa). In this way this "removal" patch would only implement the logic to disable scx_selcpu_topo_numa when SCX_OPS_BUILTIN_IDLE_PER_NODE is used. -Andrea