On Thu, Nov 18, 2021 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Joanne Koong <joannekoong@xxxxxx> writes: > > > Add benchmark to measure the overhead of the bpf_for_each call > > for a specified number of iterations. > > > > Testing this on qemu on my dev machine on 1 thread, the data is > > as follows: > > Absolute numbers from some random dev machine are not terribly useful; > others have no way of replicating your tests. A more meaningful > benchmark would need a baseline to compare to; in this case I guess that > would be a regular loop? Do you have any numbers comparing the callback > to just looping? Measuring empty for (int i = 0; i < N; i++) is meaningless, you should expect a number in billions of "operations" per second on modern server CPUs. So that will give you no idea. Those numbers are useful as a ballpark number of what's the overhead of bpf_for_each() helper and callbacks. And 12ns per "iteration" is meaningful to have a good idea of how slow that can be. Depending on your hardware it can be different by 2x, maybe 3x, but not 100x. But measuring inc + cmp + jne as a baseline is both unrealistic and doesn't give much more extra information. But you can assume 2B/s, give or take. And you also can run this benchmark on your own on your hardware to get "real" numbers, as much as you can expect real numbers from artificial microbenchmark, of course. I read those numbers as "plenty fast" :) > > -Toke >