Joanne Koong <joannekoong@xxxxxx> writes: > Add benchmark to measure the throughput and latency of the bpf_loop > call. > > Testing this on qemu on my dev machine on 1 thread, the data is > as follows: > > nr_loops: 1 > bpf_loop - throughput: 43.350 ± 0.864 M ops/s, latency: 23.068 ns/op > > nr_loops: 10 > bpf_loop - throughput: 69.586 ± 1.722 M ops/s, latency: 14.371 ns/op > > nr_loops: 100 > bpf_loop - throughput: 72.046 ± 1.352 M ops/s, latency: 13.880 ns/op > > nr_loops: 500 > bpf_loop - throughput: 71.677 ± 1.316 M ops/s, latency: 13.951 ns/op > > nr_loops: 1000 > bpf_loop - throughput: 69.435 ± 1.219 M ops/s, latency: 14.402 ns/op > > nr_loops: 5000 > bpf_loop - throughput: 72.624 ± 1.162 M ops/s, latency: 13.770 ns/op > > nr_loops: 10000 > bpf_loop - throughput: 75.417 ± 1.446 M ops/s, latency: 13.260 ns/op > > nr_loops: 50000 > bpf_loop - throughput: 77.400 ± 2.214 M ops/s, latency: 12.920 ns/op > > nr_loops: 100000 > bpf_loop - throughput: 78.636 ± 2.107 M ops/s, latency: 12.717 ns/op > > nr_loops: 500000 > bpf_loop - throughput: 76.909 ± 2.035 M ops/s, latency: 13.002 ns/op > > nr_loops: 1000000 > bpf_loop - throughput: 77.636 ± 1.748 M ops/s, latency: 12.881 ns/op > > From this data, we can see that the latency per loop decreases as the > number of loops increases. On this particular machine, each loop had an > overhead of about ~13 ns, and we were able to run ~70 million loops > per second. The latency figures are great, thanks! I assume these numbers are with retpolines enabled? Otherwise 12ns seems a bit much... Or is this because of qemu? -Toke