Re: [PATCH bpf-next v8 3/5] bpf: Inline calls to bpf_loop when callback is known

Song Liu <song@xxxxxxxxxx> · Mon, 20 Jun 2022 17:35:50 -0700



On Mon, Jun 20, 2022 at 4:54 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> Calls to `bpf_loop` are replaced with direct loops to avoid
> indirection. E.g. the following:
>
>   bpf_loop(10, foo, NULL, 0);
>
> Is replaced by equivalent of the following:
>
>   for (int i = 0; i < 10; ++i)
>     foo(i, NULL);
>
> This transformation could be applied when:
> - callback is known and does not change during program execution;
> - flags passed to `bpf_loop` are always zero.
>
> Inlining logic works as follows:
>
> - During execution simulation function `update_loop_inline_state`
>   tracks the following information for each `bpf_loop` call
>   instruction:
>   - is callback known and constant?
>   - are flags constant and zero?
> - Function `optimize_bpf_loop` increases stack depth for functions
>   where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop`
>   to apply the inlining. The additional stack space is used to spill
>   registers R6, R7 and R8. These registers are used as loop counter,
>   loop maximal bound and callback context parameter;
>
> Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on
> i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op.
>
> Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>

Acked-by: Song Liu <songliubraving@xxxxxx>