On 01/08/2022 23.11, Zvi Effron wrote:
I see that many XDP patchset submissions to the bpf mailing list include benchmark numbers for packet throughput to show how much thechange improves (or worsens) performance.
It is very important to show the *change* in performance. Meaning baseline numbers for comparison is more important than the absolute performance numbers.
They frequently show numbers for a single core test.
The single core or actually single RX-queue test is important to XDP. For reasons that might surprise you(?). The intuitive reason is that it's easier to reason about and do calculations on as we know the CPU is kept 100% busy. The non-intuitive reason is that when scaling up with more CPUs, then XDP is so fast that hardware becomes the bottleneck and CPUs will start to have idle cycles. This is MUCH harder to reason about and understand, and is often misinterpreted. The xdp-paper benchmarks[2] doc examples where the HW is the bottleneck and how we identify counter via ethtool_stats.pl [3].
I was wondering what methodology people are using to generate these benchmark results?
On the packet *generator*, I usually use the kernels pktgen via the scripts in kernel tree under samples/pktgen/[1]
[1] https://github.com/torvalds/linux/tree/master/samples/pktgen Example command:$ ./samples/pktgen/pktgen_sample03_burst_single_flow.sh -vi mlx5p2 -d 10.40.40.2 -m 3c:fd:fe:b3:31:49 -t 12
As the script name "pktgen_sample03_burst_single_flow" indicate this is generating a single flow, which will cause the RSS-hash in the NIC hit a single RX-queue. The '-t 12' means 12 CPU cores will be generating this traffic. Our xdp-paper have detailed records of the benchmarking we did: [2] https://github.com/xdp-project/xdp-paper/tree/master/benchmarks On the Device Under Test (DUT) I usually run sample "xdp_rxq_info", that report stats on a RX-queue + CPU basis. I'm interested in hearing what other do? --Jesper[3] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl