On 2023-07-05 at 13:57:02 +0200, Peter Zijlstra wrote: > On Fri, Jun 16, 2023 at 12:04:48PM +0530, K Prateek Nayak wrote: > > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -596,7 +596,7 @@ static inline int x86_sched_itmt_flags(v > #ifdef CONFIG_SCHED_MC > static int x86_core_flags(void) > { > - return cpu_core_flags() | x86_sched_itmt_flags(); > + return cpu_core_flags() | x86_sched_itmt_flags() | SD_IDLE_SIBLING; > } I guess this flag might need to be added into the valid mask: diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index d3a3b2646ec4..4a563e9f7b10 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1540,6 +1540,7 @@ static struct cpumask ***sched_domains_numa_masks; #define TOPOLOGY_SD_FLAGS \ (SD_SHARE_CPUCAPACITY | \ SD_SHARE_PKG_RESOURCES | \ + SD_IDLE_SIBLING | \ SD_NUMA | \ SD_ASYM_PACKING) > #endif > #ifdef CONFIG_SCHED_SMT > --- a/include/linux/sched/sd_flags.h > +++ b/include/linux/sched/sd_flags.h > @@ -161,3 +161,10 @@ SD_FLAG(SD_OVERLAP, SDF_SHARED_PARENT | > * NEEDS_GROUPS: No point in preserving domain if it has a single group. > */ > SD_FLAG(SD_NUMA, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) > + > +/* > + * Search for idle CPUs in sibling groups > + * > + * NEEDS_GROUPS: Load balancing flag. > + */ > +SD_FLAG(SD_IDLE_SIBLING, SDF_NEEDS_GROUPS) > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7046,6 +7046,38 @@ static int select_idle_cpu(struct task_s > } > > /* > + * For the multiple-LLC per node case, make sure to try the other LLC's if the > + * local LLC comes up empty. > + */ > +static int > +select_idle_node(struct task_struct *p, struct sched_domain *sd, int target) > +{ > + struct sched_domain *parent = sd->parent; > + struct sched_group *sg; > + > + /* Make sure to not cross nodes. */ > + if (!parent || parent->flags & SD_NUMA) > + return -1; > + > + sg = parent->groups; > + do { > + int cpu = cpumask_first(sched_group_span(sg)); > + struct sched_domain *sd_child = per_cpu(sd_llc, cpu); > I wonder if we can use rcu_dereference() in case the cpu hotplug changes the content sd_llc points to. (I'm still thinking of the symptom you described here:) https://lore.kernel.org/lkml/20230605190746.GX83892@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ I'll launch some tests with this version on Sapphire Rapids(and with/without LLC-split hack patch). thanks, Chenyu