+Cc Jeremy On 19/10/20 14:10, Morten Rasmussen wrote: > Hi Jonathan, > The problem I see is that the benefit of keeping tasks together due to > the interconnect layout might vary significantly between systems. So if > we introduce a new cpumask for cluster it has to have represent roughly > the same system properties otherwise generic software consuming this > information could be tricked. > > If there is a provable benefit of having interconnect grouping > information, I think it would be better represented by a distance matrix > like we have for NUMA. > > Morten That's my queue to paste some of that stuff I've been rambling on and off about! With regards to cache / interconnect layout, I do believe that if we want to support in the scheduler itself then we should leverage some distance table rather than to create X extra scheduler topology levels. I had a chat with Jeremy on the ACPI side of that sometime ago. IIRC given that SLIT gives us a distance value between any two PXM, we could directly express core-to-core distance in that table. With that (and if that still lets us properly discover NUMA node spans), we could let the scheduler build dynamic NUMA-like topology levels representing the inner quirks of the cache / interconnect layout. It's mostly pipe dreams for now, but there seems to be more and more hardware where that would make sense; somewhat recently the PowerPC guys added something to their arch-specific code in that regards.