Hi, One of the tests for our XDP-based load balancer has gotten quite slow, so I dug in. Roughly, it simulates 1m distinct packets arriving at the load balancer by calling BPF_PROG_TEST_RUN a million times. distribution_test.go:40: 1000000 iterations distribution_test.go:99: Coefficient of variation: 0.52% --- PASS: TestLoadBalancerDistribution (0.00s) --- PASS: TestLoadBalancerDistribution/32_endpoints (22.04s) You can see that the test takes 20s. Running the same test with slight variations in three threads results in this: distribution_test.go:40: 1000000 iterations === CONT TestLoadBalancerDistribution/32_endpoints distribution_test.go:99: Coefficient of variation: 0.60% === CONT TestLoadBalancerDistribution/64_endpoints distribution_test.go:99: Coefficient of variation: 0.82% === CONT TestLoadBalancerDistribution/128_endpoints distribution_test.go:99: Coefficient of variation: 1.24% --- PASS: TestLoadBalancerDistribution (0.00s) --- PASS: TestLoadBalancerDistribution/32_endpoints (55.61s) --- PASS: TestLoadBalancerDistribution/64_endpoints (55.61s) --- PASS: TestLoadBalancerDistribution/128_endpoints (55.61s) It's pretty clear that something is serialising the threads. Digging around in perf reveals that the culprit is bpf_prog_change_xdp called from bpf_prog_test_run_xdp. The call was added in f23c4b3924d2 ("bpf: Start using the BPF dispatcher in BPF_TEST_RUN"). Is there something we can do about this? Maybe only call into the dispatcher when repeat > 1? Best Lorenz -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com