Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark

Joanne Koong <joannekoong@xxxxxx> · Tue, 23 Nov 2021 16:20:06 -0800

On 11/23/21 11:19 AM, Toke Høiland-Jørgensen wrote:

Joanne Koong <joannekoong@xxxxxx> writes:

Add benchmark to measure the throughput and latency of the bpf_loop
call.

Testing this on qemu on my dev machine on 1 thread, the data is
as follows:

         nr_loops: 1
bpf_loop - throughput: 43.350 ± 0.864 M ops/s, latency: 23.068 ns/op

         nr_loops: 10
bpf_loop - throughput: 69.586 ± 1.722 M ops/s, latency: 14.371 ns/op

         nr_loops: 100
bpf_loop - throughput: 72.046 ± 1.352 M ops/s, latency: 13.880 ns/op

         nr_loops: 500
bpf_loop - throughput: 71.677 ± 1.316 M ops/s, latency: 13.951 ns/op

         nr_loops: 1000
bpf_loop - throughput: 69.435 ± 1.219 M ops/s, latency: 14.402 ns/op

         nr_loops: 5000
bpf_loop - throughput: 72.624 ± 1.162 M ops/s, latency: 13.770 ns/op

         nr_loops: 10000
bpf_loop - throughput: 75.417 ± 1.446 M ops/s, latency: 13.260 ns/op

         nr_loops: 50000
bpf_loop - throughput: 77.400 ± 2.214 M ops/s, latency: 12.920 ns/op

         nr_loops: 100000
bpf_loop - throughput: 78.636 ± 2.107 M ops/s, latency: 12.717 ns/op

         nr_loops: 500000
bpf_loop - throughput: 76.909 ± 2.035 M ops/s, latency: 13.002 ns/op

         nr_loops: 1000000
bpf_loop - throughput: 77.636 ± 1.748 M ops/s, latency: 12.881 ns/op

 From this data, we can see that the latency per loop decreases as the
number of loops increases. On this particular machine, each loop had an
overhead of about ~13 ns, and we were able to run ~70 million loops
per second.
The latency figures are great, thanks! I assume these numbers are with
retpolines enabled? Otherwise 12ns seems a bit much... Or is this
because of qemu?
I just tested it on a machine (without retpoline enabled) that runs on 
actual
hardware and here is what I found:

            nr_loops: 1
    bpf_loop - throughput: 46.780 ± 0.064 M ops/s, latency: 21.377 ns/op

            nr_loops: 10
    bpf_loop - throughput: 198.519 ± 0.155 M ops/s, latency: 5.037 ns/op

            nr_loops: 100
    bpf_loop - throughput: 247.448 ± 0.305 M ops/s, latency: 4.041 ns/op

            nr_loops: 500
    bpf_loop - throughput: 260.839 ± 0.380 M ops/s, latency: 3.834 ns/op

            nr_loops: 1000
    bpf_loop - throughput: 262.806 ± 0.629 M ops/s, latency: 3.805 ns/op

            nr_loops: 5000
    bpf_loop - throughput: 264.211 ± 1.508 M ops/s, latency: 3.785 ns/op

            nr_loops: 10000
    bpf_loop - throughput: 265.366 ± 3.054 M ops/s, latency: 3.768 ns/op

            nr_loops: 50000
    bpf_loop - throughput: 235.986 ± 20.205 M ops/s, latency: 4.238 ns/op

            nr_loops: 100000
    bpf_loop - throughput: 264.482 ± 0.279 M ops/s, latency: 3.781 ns/op

            nr_loops: 500000
    bpf_loop - throughput: 309.773 ± 87.713 M ops/s, latency: 3.228 ns/op

            nr_loops: 1000000
    bpf_loop - throughput: 262.818 ± 4.143 M ops/s, latency: 3.805 ns/op

The latency is about ~4ns / loop.

I will update the commit message in v3 with these new numbers as well.

-Toke