On Thu, Mar 03, 2022 at 06:36:30PM +1300, Barry Song wrote: > On Thu, Mar 3, 2022 at 3:22 PM Darren Hart > <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote: > > > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop > > > > Control Unit, but have no shared CPU-side last level cache. > > > > > > > > cpu_coregroup_mask() will return a cpumask with weight 1, while > > > > cpu_clustergroup_mask() will return a cpumask with weight 2. > > > > > > > > As a result, build_sched_domain() will BUG() once per CPU with: > > > > > > > > BUG: arch topology borken > > > > the CLS domain not a subset of the MC domain > > > > > > > > The MC level cpumask is then extended to that of the CLS child, and is > > > > later removed entirely as redundant. This sched domain topology is an > > > > improvement over previous topologies, or those built without > > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads. > > > > With the current scheduler model and heuristics, this is a desirable > > > > default topology for Ampere Altra and Altra Max system. > > > > > > > > Introduce an alternate sched domain topology for arm64 without the MC > > > > level and test for llc_sibling weight 1 across all CPUs to enable it. > > > > > > > > Do this in arch/arm64/kernel/smp.c (as opposed to > > > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now > > > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to > > > > detect the cluster level being above the cpu llc level. This is > > > > consistent with other architectures and provides a readily extensible > > > > mechanism for other alternate topologies. > > > > > > > > The final sched domain topology for a 2 socket Ampere Altra system is > > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided: > > > > > > > > For CPU0: > > > > > > > > CONFIG_SCHED_CLUSTER=y > > > > CLS [0-1] > > > > DIE [0-79] > > > > NUMA [0-159] > > > > > > > > CONFIG_SCHED_CLUSTER is not set > > > > DIE [0-79] > > > > NUMA [0-159] > > > > > > > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > > > > Cc: Will Deacon <will@xxxxxxxxxx> > > > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx> > > > > Cc: Barry Song <song.bao.hua@xxxxxxxxxxxxx> > > > > Cc: Valentin Schneider <valentin.schneider@xxxxxxx> > > > > Cc: D. Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx> > > > > Cc: Ilkka Koskinen <ilkka@xxxxxxxxxxxxxxxxxxxxxx> > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 5.16.x > > > > Signed-off-by: Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> > > > > --- > > > > arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++ > > > > 1 file changed, 28 insertions(+) > > > > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > > > index 27df5c1e6baa..3597e75645e1 100644 > > > > --- a/arch/arm64/kernel/smp.c > > > > +++ b/arch/arm64/kernel/smp.c > > > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void) > > > > } > > > > } > > > > > > > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = { > > > > +#ifdef CONFIG_SCHED_SMT > > > > + { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) }, > > > > +#endif > > > > + > > > > +#ifdef CONFIG_SCHED_CLUSTER > > > > + { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) }, > > > > +#endif > > > > + > > > > + { cpu_cpu_mask, SD_INIT_NAME(DIE) }, > > > > + { NULL, }, > > > > +}; > > > > + > > > > +static void __init update_sched_domain_topology(void) > > > > +{ > > > > + int cpu; > > > > + > > > > + for_each_possible_cpu(cpu) { > > > > + if (cpu_topology[cpu].llc_id != -1 && > > > > > > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set > > > by ACPI system and llc_id == -1 for others like DT based system > > > > > > > + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1) > > > > + return; > > > > + } > > > > Hi Vincent, > > > > I did not have a non-acpi system to test, no. You're right of course, > > llc_id is only set by ACPI systems on arm64. We could wrap this in a > > CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable: > > > > + for_each_possible_cpu(cpu) { > > + if (cpu_topology[cpu].llc_id == -1 || > > + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1) > > + return; > > + } > > > > Quickly tested on Altra successfully. Would appreciate anyone with non-acpi > > arm64 systems who can test and verify this behaves as intended. I will ask > > around tomorrow as well to see what I may have access to. > > I wonder if we can fix it by this > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index 976154140f0b..551655ccd0eb 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -627,6 +627,13 @@ const struct cpumask *cpu_coregroup_mask(int cpu) > if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask)) > core_mask = &cpu_topology[cpu].llc_sibling; > } > + /* > + * Some machines have no LLC but have clusters, we let MC = CLUSTER > + * as MC should always be after CLUSTER. But anyway, the MC domain > + * will be removed > + */ > + if (cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling)) > + core_mask = &cpu_topology[cpu].cluster_sibling; > > return core_mask; > } > > as it can make all kinds of topologies happy - symmetric and asymmetric. > Hah. Full circle. Yes, this works, and it's basically what we'd started with internally. I ended up exploring various paths here to avoid a "band aid" and to target the fix and minimize impact. That said, after digging through the acpi, topology, smp, and sched domains code... I don't think this approach is a band aid and it's a very minimal solution. The only downside I can think of is masking a potential topology bug and not catching it in the scheduler - that seems very unlikely. I'm perfectly happy with this solution as well. Will D, would you prefer this approach? +Sudeep, Greg, and Rafael, Are you OK with this approach? If so, we can drop my arm64 specific new topology patch and I can send a version of this one out (suggested-by Barry of course), unless you'd prefer to send it Barry? Thanks, > > > > Thanks, > > > > > > + > > > > + pr_info("No LLC siblings, using No MC sched domains topology\n"); > > > > + set_sched_topology(arm64_no_mc_topology); > > > > +} > > > > + > > > > void __init smp_cpus_done(unsigned int max_cpus) > > > > { > > > > pr_info("SMP: Total of %d processors activated.\n", num_online_cpus()); > > > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus) > > > > hyp_mode_check(); > > > > apply_alternatives_all(); > > > > mark_linear_text_alias_ro(); > > > > + update_sched_domain_topology(); > > > > } > > > > > > > > void __init smp_prepare_boot_cpu(void) > > > > -- > > > > 2.31.1 > > > > > > > > -- > > Darren Hart > > Ampere Computing / OS and Kernel > > Thanks > Barry -- Darren Hart Ampere Computing / OS and Kernel