On Mon, Mar 14, 2022 at 05:35:05PM +0100, Dietmar Eggemann wrote: > On 09/03/2022 19:26, Darren Hart wrote: > > On Wed, Mar 09, 2022 at 01:50:07PM +0100, Dietmar Eggemann wrote: > >> On 08/03/2022 18:49, Darren Hart wrote: > >>> On Tue, Mar 08, 2022 at 05:03:07PM +0100, Dietmar Eggemann wrote: > >>>> On 08/03/2022 12:04, Vincent Guittot wrote: > >>>>> On Tue, 8 Mar 2022 at 11:30, Will Deacon <will@xxxxxxxxxx> wrote: > > [...] > > >>>> I do not have any better idea than this tweak here either in case the > >>>> platform can't provide a cleaner setup. > >>> > >>> I'd argue The platform is describing itself accurately in ACPI PPTT > >>> terms. The topology doesn't fit nicely within the kernel abstractions > >>> today. This is an area where I hope to continue to improve things going > >>> forward. > >> > >> I see. And I assume lying about SCU/LLC boundaries in ACPI is not an > >> option since it messes up /sys/devices/system/cpu/cpu0/cache/index*/. > >> > >> [...] > > > > I'm not aware of a way to accurately describe the SCU topology in the PPTT, and > > the risk we run with lying about LLC topology is that lie has to be comprehended > > by all OSes and not conflict with other lies people may ask for. In general, I > > think it is preferable and more maintainable to describe the topology as > > accurately and honestly as we can within the existing platform mechanisms (PPTT, > > HMAT, etc) and work on the higher level abstractions to accommodate a broader > > set of topologies as they emerge (as well as working to more fully describe the > > topology with new platform level mechanisms as needed). > > > > As I mentioned, I intend to continue looking in to how to improve the current > > abstractions. For now, it sounds like we have agreement that this patch can be > > merged to address the BUG? > > What about swapping the CLS and MC cpumasks for such a machine? This > would avoid that the task scheduler has to deal with a system which has > CLS but no MC. We essentially promote the CLS cpumask up to MC in this > case. > > cat /sys/kernel/debug/sched/domains/cpu0/domain*/name > MC > ^^ > DIE > NUMA > > cat /sys/kernel/debug/sched/domains/cpu0# cat domain*/flags > SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING > ^^^^^^^^^^^^^^^^^^^^^^ > SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_PREFER_SIBLING > SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SERIALIZE SD_OVERLAP SD_NUMA > > Only very lightly tested on Altra and Juno-r0 (DT). > > --->8--- > > From 54bef59e7f50fa41b7ae39190fd71af57209c27d Mon Sep 17 00:00:00 2001 > From: Dietmar Eggemann <dietmar.eggemann@xxxxxxx> > Date: Mon, 14 Mar 2022 15:08:23 +0000 > Subject: [PATCH] arch_topology: Swap MC & CLS SD mask if MC weight==1 & > subset(MC,CLS) > > This avoids the issue of having a system with a CLS SD but no MC SD. > CLS should be sub-SD of MC. Hi Dietmar, Ultimately, this delivers the same result. I do think it imposes more complexity for everyone to address what as far as I'm aware only affect the one system. I don't think the term "Cluster" has a clear and universally understood definition, so I don't think it's a given that "CLS should be sub-SD of MC". I think this has been assumed, and that assumption has mostly held up, but this is an abstraction, and the abstraction should follow the physical topologies rather than the other way around in my opinion. If that's the primary motivation for this approach, I don't think it justifies the additional complexity. All told, I prefer the 2 line change contained within cpu_coregroup_mask() which handles the one known exception with minimal impact. It's easy enough to come back to this to address more cases with a more complex solution if needed in the future - but I prefer to introduce the least amount of complexity as possible to address the known issues, especially if the end result is the same and the cost is paid by the affected systems. Thanks, > > The cpumask under /sys/devices/system/cpu/cpu*/cache/index* and > /sys/devices/system/cpu/cpu*/topology are not changed by this. > > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx> > --- > drivers/base/arch_topology.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index 976154140f0b..9af90a5625c7 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -614,7 +614,7 @@ static int __init parse_dt_topology(void) > struct cpu_topology cpu_topology[NR_CPUS]; > EXPORT_SYMBOL_GPL(cpu_topology); > > -const struct cpumask *cpu_coregroup_mask(int cpu) > +const struct cpumask *_cpu_coregroup_mask(int cpu) > { > const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); > > @@ -631,11 +631,37 @@ const struct cpumask *cpu_coregroup_mask(int cpu) > return core_mask; > } > > -const struct cpumask *cpu_clustergroup_mask(int cpu) > +const struct cpumask *_cpu_clustergroup_mask(int cpu) > { > return &cpu_topology[cpu].cluster_sibling; > } > > +static int > +swap_masks(const cpumask_t *core_mask, const cpumask_t *cluster_mask) > +{ > + if (cpumask_weight(core_mask) == 1 && > + cpumask_subset(core_mask, cluster_mask)) > + return 1; > + > + return 0; > +} > + > +const struct cpumask *cpu_coregroup_mask(int cpu) > +{ > + const cpumask_t *cluster_mask = _cpu_clustergroup_mask(cpu); > + const cpumask_t *core_mask = _cpu_coregroup_mask(cpu); > + > + return swap_masks(core_mask, cluster_mask) ? cluster_mask : core_mask; > +} > + > +const struct cpumask *cpu_clustergroup_mask(int cpu) > +{ > + const cpumask_t *cluster_mask = _cpu_clustergroup_mask(cpu); > + const cpumask_t *core_mask = _cpu_coregroup_mask(cpu); > + > + return swap_masks(core_mask, cluster_mask) ? core_mask : cluster_mask; > +} > + > void update_siblings_masks(unsigned int cpuid) > { > struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid]; > -- > 2.25.1 -- Darren Hart Ampere Computing / OS and Kernel