Albert Huang <huangjie.albert@xxxxxxxxxxxxx> writes: > AF_XDP is a kernel bypass technology that can greatly improve performance. > However,for virtual devices like veth,even with the use of AF_XDP sockets, > there are still many additional software paths that consume CPU resources. > This patch series focuses on optimizing the performance of AF_XDP sockets > for veth virtual devices. Patches 1 to 4 mainly involve preparatory work. > Patch 5 introduces tx queue and tx napi for packet transmission, while > patch 8 primarily implements batch sending for IPv4 UDP packets, and patch 9 > add support for AF_XDP tx need_wakup feature. These optimizations significantly > reduce the software path and support checksum offload. > > I tested those feature with > A typical topology is shown below: > client(send): server:(recv) > veth<-->veth-peer veth1-peer<--->veth1 > 1 | | 7 > |2 6| > | | > bridge<------->eth0(mlnx5)- switch -eth1(mlnx5)<--->bridge1 > 3 4 5 > (machine1) (machine2) I definitely applaud the effort to improve the performance of af_xdp over veth, this is something we have flagged as in need of improvement as well. However, looking through your patch series, I am less sure that the approach you're taking here is the right one. AFAIU (speaking about the TX side here), the main difference between AF_XDP ZC and the regular transmit mode is that in the regular TX mode the stack will allocate an skb to hold the frame and push that down the stack. Whereas in ZC mode, there's a driver NDO that gets called directly, bypassing the skb allocation entirely. In this series, you're implementing the ZC mode for veth, but the driver code ends up allocating an skb anyway. Which seems to be a bit of a weird midpoint between the two modes, and adds a lot of complexity to the driver that (at least conceptually) is mostly just a reimplementation of what the stack does in non-ZC mode (allocate an skb and push it through the stack). So my question is, why not optimise the non-zc path in the stack instead of implementing the zc logic for veth? It seems to me that it would be quite feasible to apply the same optimisations (bulking, and even GRO) to that path and achieve the same benefits, without having to add all this complexity to the veth driver? -Toke