Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > On Fri, Nov 19, 2021 at 5:04 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: >> >> > On Thu, Nov 18, 2021 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> >> >> Joanne Koong <joannekoong@xxxxxx> writes: >> >> >> >> > Add benchmark to measure the overhead of the bpf_for_each call >> >> > for a specified number of iterations. >> >> > >> >> > Testing this on qemu on my dev machine on 1 thread, the data is >> >> > as follows: >> >> >> >> Absolute numbers from some random dev machine are not terribly useful; >> >> others have no way of replicating your tests. A more meaningful >> >> benchmark would need a baseline to compare to; in this case I guess that >> >> would be a regular loop? Do you have any numbers comparing the callback >> >> to just looping? >> > >> > Measuring empty for (int i = 0; i < N; i++) is meaningless, you should >> > expect a number in billions of "operations" per second on modern >> > server CPUs. So that will give you no idea. Those numbers are useful >> > as a ballpark number of what's the overhead of bpf_for_each() helper >> > and callbacks. And 12ns per "iteration" is meaningful to have a good >> > idea of how slow that can be. Depending on your hardware it can be >> > different by 2x, maybe 3x, but not 100x. >> > >> > But measuring inc + cmp + jne as a baseline is both unrealistic and >> > doesn't give much more extra information. But you can assume 2B/s, >> > give or take. >> > >> > And you also can run this benchmark on your own on your hardware to >> > get "real" numbers, as much as you can expect real numbers from >> > artificial microbenchmark, of course. >> > >> > >> > I read those numbers as "plenty fast" :) >> >> Hmm, okay, fair enough, but I think it would be good to have the "~12 ns >> per iteration" figure featured prominently in the commit message, then :) >> > > We discussed with Joanne offline adding an ops_report_final() helper > that will output both throughput (X ops/s) and latency/overhead ( > (1000000000/X) ns/op), so that no one had to do any math. Alright, sounds good, thanks! -Toke