On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop > Control Unit, but have no shared CPU-side last level cache. > > cpu_coregroup_mask() will return a cpumask with weight 1, while > cpu_clustergroup_mask() will return a cpumask with weight 2. > > As a result, build_sched_domain() will BUG() once per CPU with: > > BUG: arch topology borken > the CLS domain not a subset of the MC domain > > The MC level cpumask is then extended to that of the CLS child, and is > later removed entirely as redundant. This sched domain topology is an > improvement over previous topologies, or those built without > SCHED_CLUSTER, particularly for certain latency sensitive workloads. > With the current scheduler model and heuristics, this is a desirable > default topology for Ampere Altra and Altra Max system. > > Introduce an alternate sched domain topology for arm64 without the MC > level and test for llc_sibling weight 1 across all CPUs to enable it. > > Do this in arch/arm64/kernel/smp.c (as opposed to > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now > populated and we avoid needing to extend the drivers/acpi/pptt.c API to > detect the cluster level being above the cpu llc level. This is > consistent with other architectures and provides a readily extensible > mechanism for other alternate topologies. > > The final sched domain topology for a 2 socket Ampere Altra system is > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided: > > For CPU0: > > CONFIG_SCHED_CLUSTER=y > CLS [0-1] > DIE [0-79] > NUMA [0-159] > > CONFIG_SCHED_CLUSTER is not set > DIE [0-79] > NUMA [0-159] > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx> > Cc: Barry Song <song.bao.hua@xxxxxxxxxxxxx> > Cc: Valentin Schneider <valentin.schneider@xxxxxxx> > Cc: D. Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx> > Cc: Ilkka Koskinen <ilkka@xxxxxxxxxxxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # 5.16.x > Signed-off-by: Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> > --- > arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 27df5c1e6baa..3597e75645e1 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void) > } > } > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = { > +#ifdef CONFIG_SCHED_SMT > + { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) }, > +#endif > + > +#ifdef CONFIG_SCHED_CLUSTER > + { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) }, > +#endif > + > + { cpu_cpu_mask, SD_INIT_NAME(DIE) }, > + { NULL, }, > +}; > + > +static void __init update_sched_domain_topology(void) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + if (cpu_topology[cpu].llc_id != -1 && Have you tested it with a non-acpi system ? AFAICT, llc_id is only set by ACPI system and llc_id == -1 for others like DT based system > + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1) > + return; > + } > + > + pr_info("No LLC siblings, using No MC sched domains topology\n"); > + set_sched_topology(arm64_no_mc_topology); > +} > + > void __init smp_cpus_done(unsigned int max_cpus) > { > pr_info("SMP: Total of %d processors activated.\n", num_online_cpus()); > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus) > hyp_mode_check(); > apply_alternatives_all(); > mark_linear_text_alias_ro(); > + update_sched_domain_topology(); > } > > void __init smp_prepare_boot_cpu(void) > -- > 2.31.1 >