On Mon, Jun 20, 2022 at 4:54 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > Calls to `bpf_loop` are replaced with direct loops to avoid > indirection. E.g. the following: > > bpf_loop(10, foo, NULL, 0); > > Is replaced by equivalent of the following: > > for (int i = 0; i < 10; ++i) > foo(i, NULL); > > This transformation could be applied when: > - callback is known and does not change during program execution; > - flags passed to `bpf_loop` are always zero. > > Inlining logic works as follows: > > - During execution simulation function `update_loop_inline_state` > tracks the following information for each `bpf_loop` call > instruction: > - is callback known and constant? > - are flags constant and zero? > - Function `optimize_bpf_loop` increases stack depth for functions > where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop` > to apply the inlining. The additional stack space is used to spill > registers R6, R7 and R8. These registers are used as loop counter, > loop maximal bound and callback context parameter; > > Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on > i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op. > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> Acked-by: Song Liu <songliubraving@xxxxxx>