Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Tue, 23 Nov 2021 20:19:34 +0100

Joanne Koong <joannekoong@xxxxxx> writes:

> Add benchmark to measure the throughput and latency of the bpf_loop
> call.
>
> Testing this on qemu on my dev machine on 1 thread, the data is
> as follows:
>
>         nr_loops: 1
> bpf_loop - throughput: 43.350 ± 0.864 M ops/s, latency: 23.068 ns/op
>
>         nr_loops: 10
> bpf_loop - throughput: 69.586 ± 1.722 M ops/s, latency: 14.371 ns/op
>
>         nr_loops: 100
> bpf_loop - throughput: 72.046 ± 1.352 M ops/s, latency: 13.880 ns/op
>
>         nr_loops: 500
> bpf_loop - throughput: 71.677 ± 1.316 M ops/s, latency: 13.951 ns/op
>
>         nr_loops: 1000
> bpf_loop - throughput: 69.435 ± 1.219 M ops/s, latency: 14.402 ns/op
>
>         nr_loops: 5000
> bpf_loop - throughput: 72.624 ± 1.162 M ops/s, latency: 13.770 ns/op
>
>         nr_loops: 10000
> bpf_loop - throughput: 75.417 ± 1.446 M ops/s, latency: 13.260 ns/op
>
>         nr_loops: 50000
> bpf_loop - throughput: 77.400 ± 2.214 M ops/s, latency: 12.920 ns/op
>
>         nr_loops: 100000
> bpf_loop - throughput: 78.636 ± 2.107 M ops/s, latency: 12.717 ns/op
>
>         nr_loops: 500000
> bpf_loop - throughput: 76.909 ± 2.035 M ops/s, latency: 13.002 ns/op
>
>         nr_loops: 1000000
> bpf_loop - throughput: 77.636 ± 1.748 M ops/s, latency: 12.881 ns/op
>
> From this data, we can see that the latency per loop decreases as the
> number of loops increases. On this particular machine, each loop had an
> overhead of about ~13 ns, and we were able to run ~70 million loops
> per second.

The latency figures are great, thanks! I assume these numbers are with
retpolines enabled? Otherwise 12ns seems a bit much... Or is this
because of qemu?

-Toke