On Wed, Jul 17, 2024 at 11:42:31AM +0200, Vlastimil Babka wrote: > Seems to me it could be (except that ZONE_DMA corner case) a general > scalability issue in that you tweak some part of the kernel and the > contention moves elsewhere. At least in MM we have per-node locks so this > means 256 CPUs per lock? It used to be that there were not that many > (cores/threads) per a physical CPU and its NUMA node, so many cpus would > mean also more NUMA nodes where the locks contention would distribute among > them. I think you could try fakenuma to create these nodes artificially and > see if it helps for the MM part. But if the contention moves to e.g. an > inode lock, I'm not sure what to do about that then. AMD EPYC BIOSes have an option called NPS (Nodes Per Socket) that can be set to 1, 2, 4 or 8 and that divides the system up into the chosen number of NUMA nodes. Karim PhD Student Edinburgh University