On 2024-05-10 18:21:24 [+0200], To Jesper Dangaard Brouer wrote: > The XDP redirect process is two staged: … On 2024-05-07 15:27:44 [+0200], Jesper Dangaard Brouer wrote: > > I need/want to echo Toke's request to benchmark these changes. I have: boxA: ixgbe boxB: i40e Both are bigger NUMA boxes. I have to patch ixgbe to ignore the 64CPU limit and I boot box with only 64CPUs. The IOMMU has been disabled on both box as well as CPU mitigations. The link is 10G. The base for testing I have is commit a17ef9e6c2c1c ("net_sched: sch_sfq: annotate data-races around q->perturb_period") which I used to rebase my series on top of. pktgen_sample03_burst_single_flow.sh has been used to send packets and "xdp-bench drop $nic -e" to receive them. baseline ~~~~~~~~ boxB -> boxA | gov performance -t2 (to pktgen) | receive total 14,854,233 pkt/s 14,854,233 drop/s 0 error/s -t1 (to pktgen) | receive total 10,642,895 pkt/s 10,642,895 drop/s 0 error/s boxB -> boxA | gov powersave -t2 (to pktgen) receive total 10,196,085 pkt/s 10,196,085 drop/s 0 error/s receive total 10,187,254 pkt/s 10,187,254 drop/s 0 error/s receive total 10,553,298 pkt/s 10,553,298 drop/s 0 error/s -t1 receive total 10,427,732 pkt/s 10,427,732 drop/s 0 error/s ====== boxA -> boxB (-t1) gov performance performace: receive total 13,171,962 pkt/s 13,171,962 drop/s 0 error/s receive total 13,368,344 pkt/s 13,368,344 drop/s 0 error/s powersave: receive total 13,343,136 pkt/s 13,343,136 drop/s 0 error/s receive total 13,220,326 pkt/s 13,220,326 drop/s 0 error/s (I the CPU governor had no impact, just noise) The series applied (with updated 14/15) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ boxB -> boxA | gov performance -t2: receive total 14,880,199 pkt/s 14,880,199 drop/s 0 error/s -t1: receive total 10,769,082 pkt/s 10,769,082 drop/s 0 error/s boxB -> boxA | gov powersave -t2: receive total 11,163,323 pkt/s 11,163,323 drop/s 0 error/s -t1: receive total 10,756,515 pkt/s 10,756,515 drop/s 0 error/s boxA -> boxB | gov perfomance receive total 13,395,919 pkt/s 13,395,919 drop/s 0 error/s boxA -> boxB | gov perfomance receive total 13,290,527 pkt/s 13,290,527 drop/s 0 error/s Based on my numbers, there is just noise. BoxA hit the CPU limit during receive while lowering the CPU-freq. BoxB seems to be unaffected by lowing CPU frequency during receive. I can't comment on anything >10G due to HW limits. Sebastian