Re: [PATCH bpf-next v6 3/5] bpf: Inline calls to bpf_loop when callback is known

Song Liu <song@xxxxxxxxxx> · Mon, 13 Jun 2022 08:48:00 -0700

On Mon, Jun 13, 2022 at 8:02 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> Calls to `bpf_loop` are replaced with direct loops to avoid
> indirection. E.g. the following:
>
>   bpf_loop(10, foo, NULL, 0);
>
> Is replaced by equivalent of the following:
>
>   for (int i = 0; i < 10; ++i)
>     foo(i, NULL);
>
> This transformation could be applied when:
> - callback is known and does not change during program execution;
> - flags passed to `bpf_loop` are always zero.
>
> Inlining logic works as follows:
>
> - During execution simulation function `update_loop_inline_state`
>   tracks the following information for each `bpf_loop` call
>   instruction:
>   - is callback known and constant?
>   - are flags constant and zero?
> - Function `optimize_bpf_loop` increases stack depth for functions
>   where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop`
>   to apply the inlining. The additional stack space is used to spill
>   registers R6, R7 and R8. These registers are used as loop counter,
>   loop maximal bound and callback context parameter;
>
> Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on
> i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op.
>
> Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
[...]

> +static int optimize_bpf_loop(struct bpf_verifier_env *env)
> +{
> +       struct bpf_subprog_info *subprogs = env->subprog_info;
> +       int i, cur_subprog = 0, cnt, delta = 0;
> +       struct bpf_insn *insn = env->prog->insnsi;
> +       int insn_cnt = env->prog->len;
> +       u16 stack_depth = subprogs[cur_subprog].stack_depth;
> +       u16 stack_depth_extra = 0;
> +
> +       for (i = 0; i < insn_cnt; i++, insn++) {
> +               struct bpf_loop_inline_state *inline_state =
> +                       &env->insn_aux_data[i + delta].loop_inline_state;
> +
> +               if (is_bpf_loop_call(insn) && inline_state->fit_for_inline) {
> +                       struct bpf_prog *new_prog;
> +
> +                       stack_depth_extra = BPF_REG_SIZE * 3;
> +                       new_prog = inline_bpf_loop(env,
> +                                                  i + delta,
> +                                                  -(stack_depth + stack_depth_extra),
> +                                                  inline_state->callback_subprogno,
> +                                                  &cnt);
> +                       if (!new_prog)
> +                               return -ENOMEM;

We do not fail over for -ENOMEM, which is reasonable. (It is also reasonable if
we do fail the program with -ENOMEM. However, if we don't fail the program,
we need to update stack_depth properly before returning, right?

Thanks,
Song

> +
> +                       delta     += cnt - 1;
> +                       env->prog  = new_prog;
> +                       insn       = new_prog->insnsi + i + delta;
> +               }
> +
> +               if (subprogs[cur_subprog + 1].start == i + delta + 1) {
> +                       subprogs[cur_subprog].stack_depth += stack_depth_extra;
> +                       cur_subprog++;
> +                       stack_depth = subprogs[cur_subprog].stack_depth;
> +                       stack_depth_extra = 0;
> +               }
> +       }
> +
> +       env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
> +
> +       return 0;
> +}
> +