Hello, I have a question regarding bpf_redirect/bpf_redirect_map and latency that we are seeing in a test. The environment is as follows: - Debian Bullseye, running 5.18.0-0.bpo.1-amd64 kernel from Bullseye-backports (Also tested on 5.16) - Intel Xeon X3430 @ 2.40GHz. 4 cores, no HT - Intel X710-DA2 using i40e driver included with the kernel. - Both interfaces (enp1s0f0 and enps0f1) in a simple netfilter bridge. - Ring parameters for rx/tx are both set to the max of 4096, with no other nic-specific parameters changed. Each interface has 4 combined IRQs, pinned per set_irq_affinity. `irqbalanced` is not installed. Traffic is generated by another directly attached machine via iperf3 3.9 (`iperf3 -c -t 0 192.168.1.3 --bidir`) to a directly attached server on the other side. The server in question does nothing more than forward packets as a transparent bridge. An XDP program is installed on f0 to redirect to f1, and f1 to redirect to f0. I have tried programs that simply call `bpf_redirect()`, as well as programs that share a device map and call `bpf_redirect_map()`, with idententical results. When channel parameters for each interface are reduced to a single IRQ via `ethtool -L enp1s0f0 combined 1`, and both interface IRQs are bound to the same CPU core via smp_affinity, XDP produces improved bitrate with reduced CPU utilization over non-XDP tests: - Stock netfilter bridge: 9.11 Gbps in both directions at 98% utilization of pinned core. - XDP: Approximately 9.18 Gbps in both directions at 50% utilization of pinned core. However, when multiple cores are engaged (combined 4, with set_irq_affinity), XDP processes markedly fewer packets per second (950,000 vs approximately 1.6 million). iperf3 also shows a large number of retransmissions in its output regardless of CPU engagement (approximately 6,500 with XDP over 2 minutes vs 850 with single core tests). This is a sample taken from linux/samples xdp_monitor showing redirection and transmission of packets with XDP engaged: Summary 944,508 redir/s 0 err,drop/s 944,506 xmit/s kthread 0 pkt/s 0 drop/s 0 sched redirect total 944,508 redir/s cpu:0 470,148 redir/s cpu:2 15,078 redir/s cpu:3 459,282 redir/s redirect_err 0 error/s xdp_exception 0 hit/s devmap_xmit total 944,506 xmit/s 0 drop/s 0 drv_err/s cpu:0 470,148 xmit/s 0 drop/s 0 drv_err/s cpu:2 15,078 xmit/s 0 drop/s 0 drv_err/s cpu:3 459,280 xmit/s 0 drop/s 0 drv_err/s xmit enp1s0f0->enp1s0f1 485,249 xmit/s 0 drop/s 0 drv_err/s cpu:0 470,172 xmit/s 0 drop/s 0 drv_err/s cpu:2 15,078 xmit/s 0 drop/s 0 drv_err/s xmit enp1s0f1->enp1s0f0 459,263 xmit/s 0 drop/s 0 drv_err/s cpu:3 459,263 xmit/s 0 drop/s 0 drv_err/s Our current hypothesis is that this is a CPU affinity issue. We believe a different core is being used for transmission. In efforts to prove this, how can we successfully measure if bpf_redirect() is causing packets to be transmitted by a different core than they were received by? We are still trying to understand how bpf_redirect() selects which core/IRQ to transmit on and would appreciate any insight or followup material to research. Any additional information on how we might be able to overcome this would be deeply appreciated! Best regards, Adam Smith