On Mon, Jun 13, 2022 at 8:02 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > Calls to `bpf_loop` are replaced with direct loops to avoid > indirection. E.g. the following: > > bpf_loop(10, foo, NULL, 0); > > Is replaced by equivalent of the following: > > for (int i = 0; i < 10; ++i) > foo(i, NULL); > > This transformation could be applied when: > - callback is known and does not change during program execution; > - flags passed to `bpf_loop` are always zero. > > Inlining logic works as follows: > > - During execution simulation function `update_loop_inline_state` > tracks the following information for each `bpf_loop` call > instruction: > - is callback known and constant? > - are flags constant and zero? > - Function `optimize_bpf_loop` increases stack depth for functions > where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop` > to apply the inlining. The additional stack space is used to spill > registers R6, R7 and R8. These registers are used as loop counter, > loop maximal bound and callback context parameter; > > Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on > i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op. > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> [...] > +static int optimize_bpf_loop(struct bpf_verifier_env *env) > +{ > + struct bpf_subprog_info *subprogs = env->subprog_info; > + int i, cur_subprog = 0, cnt, delta = 0; > + struct bpf_insn *insn = env->prog->insnsi; > + int insn_cnt = env->prog->len; > + u16 stack_depth = subprogs[cur_subprog].stack_depth; > + u16 stack_depth_extra = 0; > + > + for (i = 0; i < insn_cnt; i++, insn++) { > + struct bpf_loop_inline_state *inline_state = > + &env->insn_aux_data[i + delta].loop_inline_state; > + > + if (is_bpf_loop_call(insn) && inline_state->fit_for_inline) { > + struct bpf_prog *new_prog; > + > + stack_depth_extra = BPF_REG_SIZE * 3; > + new_prog = inline_bpf_loop(env, > + i + delta, > + -(stack_depth + stack_depth_extra), > + inline_state->callback_subprogno, > + &cnt); > + if (!new_prog) > + return -ENOMEM; We do not fail over for -ENOMEM, which is reasonable. (It is also reasonable if we do fail the program with -ENOMEM. However, if we don't fail the program, we need to update stack_depth properly before returning, right? Thanks, Song > + > + delta += cnt - 1; > + env->prog = new_prog; > + insn = new_prog->insnsi + i + delta; > + } > + > + if (subprogs[cur_subprog + 1].start == i + delta + 1) { > + subprogs[cur_subprog].stack_depth += stack_depth_extra; > + cur_subprog++; > + stack_depth = subprogs[cur_subprog].stack_depth; > + stack_depth_extra = 0; > + } > + } > + > + env->prog->aux->stack_depth = env->subprog_info[0].stack_depth; > + > + return 0; > +} > +