Re: [RFC PATCH] topology: Represent clusters of CPUs within a die.

Valentin Schneider <valentin.schneider@xxxxxxx> · Mon, 19 Oct 2020 14:48:02 +0100

+Cc Jeremy

On 19/10/20 14:10, Morten Rasmussen wrote:
> Hi Jonathan,
> The problem I see is that the benefit of keeping tasks together due to
> the interconnect layout might vary significantly between systems. So if
> we introduce a new cpumask for cluster it has to have represent roughly
> the same system properties otherwise generic software consuming this
> information could be tricked.
>
> If there is a provable benefit of having interconnect grouping
> information, I think it would be better represented by a distance matrix
> like we have for NUMA.
>
> Morten

That's my queue to paste some of that stuff I've been rambling on and off
about!

With regards to cache / interconnect layout, I do believe that if we
want to support in the scheduler itself then we should leverage some
distance table rather than to create X extra scheduler topology levels.

I had a chat with Jeremy on the ACPI side of that sometime ago. IIRC given
that SLIT gives us a distance value between any two PXM, we could directly
express core-to-core distance in that table. With that (and if that still
lets us properly discover NUMA node spans), we could let the scheduler
build dynamic NUMA-like topology levels representing the inner quirks of
the cache / interconnect layout.

It's mostly pipe dreams for now, but there seems to be more and more
hardware where that would make sense; somewhat recently the PowerPC guys
added something to their arch-specific code in that regards.