Federico Parola <federico.parola@xxxxxxxxx> writes: > Dear all, > I would like to share with this community a draft I recently wrote [1] > on the performance comparison of XDP and AF_XDP packet processing. > In the paper we found some interesting and unexpected results > (especially related to the impact of addressed memory on the performance > of the two technologies) and tried to envision a combined use of the two > technologies, especially to tackle the poor performance of re-injecting > packets into the kernel from user space to leverage the TCP/IP stack. > Any comment and suggestion from this community or any type of joint > work/collaboration would be very appreciated. Hi Federico Thank you for the link! All in all I thought it was a nicely done performance comparison. One thing that might be interesting would be to do the same comparison on a different driver. A lot of the performance details you're discovering in this paper boils down to details about how the driver data path is implemented. For instance, it's an Intel-specific thing that there's a whole separate path for zero-copy AF_XDP. Any plans to replicate the study using, say, an mlx5-based NIC? Also, a couple of comments on details: - The performance delta you show in Figure 9 where AF_XDP is faster at hair-pin forwarding than XDP was a bit puzzling; the two applications should basically be doing the same thing. It seems to be because the i40e driver converts the xdp_buff struct to an xdp_frame before transmitting it out the interface again: https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L2280 - It's interesting that userspace seems to handle scattered memory accesses over a large range better than kernel-space. It would be interesting to know why; you mention you're leaving this to future studies, any plans of following up and trying to figure this out? :) Finally, since you seem to have your tests packaged up nicely, do you think it would be possible to take (some of) them and turn them into a kind of "performance CI" test suite, that can be run automatically, or semi-automatically to catch future performance regressions in the XDP stack? Such a test suite would be pretty great to have so we can avoid the "death by a thousand paper cuts" type of gradual performance degradation as we add new features... -Toke