Le lun. 19 août 2024 à 17:34, Jakub Kicinski <kuba@xxxxxxxxxx> a écrit : > > On Mon, 19 Aug 2024 12:15:10 +0200 Erwan Velu wrote: > > 2/ I was also wondering if we shouldn't have a kernel module option to > > choose the allocation algorithm (I have a POC in that direction). > > The benefit could be allowing the platform owner to select the > > allocation algorithm that sys-admin needs. > > On single-package AMD EPYC servers, the numa topology is pretty handy > > for mapping the L3 affinity but it doesn't provide any particular hint > > about the actual "distance" to the network device. > > You can have up to 12 NUMA nodes on a single package but the actual > > distance to the nic is almost identical as each core needs to use the > > IOdie to reach the PCI devices. > > We can see in the NUMA allocation logic assumptions like "1 NUMA per > > package" logic that the actual distance between nodes should be > > considered in the allocation logic. > > I think user space has more information on what the appropriate > placement is than the kernel. We can have a reasonable default, > and maybe try not to stupidly reset the settings when config > changes (I don't think mlx5 does that but other drivers do); > but having a way to select algorithm would only work if there > was a well understood and finite set of algorithms. I totally agree with this view, I'm wondering if people who used to work on the mlx driver can provide hints about this task. I have no idea if that requires any particular task at the fw level. Is this a complex task to perform? That feature would be super helpful to get precise tuning. > IMHO we should try to sell this task to systemd-networkd or some other > user space daemon. We now have netlink access to NAPI information, > including IRQ<>NAPI<>queue mapping. It's possible to implement a > completely driver-agnostic IRQ mapping support from user space > (without the need to grep irq names like we used to) Clearly that would be a nice path to achieve this feature.