On Mon, 19 Aug 2024 12:15:10 +0200 Erwan Velu wrote: > 2/ I was also wondering if we shouldn't have a kernel module option to > choose the allocation algorithm (I have a POC in that direction). > The benefit could be allowing the platform owner to select the > allocation algorithm that sys-admin needs. > On single-package AMD EPYC servers, the numa topology is pretty handy > for mapping the L3 affinity but it doesn't provide any particular hint > about the actual "distance" to the network device. > You can have up to 12 NUMA nodes on a single package but the actual > distance to the nic is almost identical as each core needs to use the > IOdie to reach the PCI devices. > We can see in the NUMA allocation logic assumptions like "1 NUMA per > package" logic that the actual distance between nodes should be > considered in the allocation logic. I think user space has more information on what the appropriate placement is than the kernel. We can have a reasonable default, and maybe try not to stupidly reset the settings when config changes (I don't think mlx5 does that but other drivers do); but having a way to select algorithm would only work if there was a well understood and finite set of algorithms. IMHO we should try to sell this task to systemd-networkd or some other user space daemon. We now have netlink access to NAPI information, including IRQ<>NAPI<>queue mapping. It's possible to implement a completely driver-agnostic IRQ mapping support from user space (without the need to grep irq names like we used to)