Re: [PATCH v5 bpf-next 1/4] bpf: Introduce may_goto instruction

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 5 Mar 2024 12:52:20 -0800

On Tue, Mar 5, 2024 at 10:38 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >
> > Introduce may_goto instruction that acts on a hidden bpf_iter_num, so that
> > bpf_iter_num_new(), bpf_iter_num_destroy() don't need to be called explicitly.
>
> bpf_iter_num was probably an inspiration, but I think by now the
> analogy is pretty weak. bpf_iter_num_next() returns NULL or pointer to
> int (i.e., it returns some usable value), while may_goto jumps or not.
> So it's not just implicit new/destroy. The above doesn't confuse me,
> but I wonder if someone less familiar with iterators would be confused
> by the above?

Agree. Will reword.

>
> > It can be used in any normal "for" or "while" loop, like
> >
> >   for (i = zero; i < cnt; cond_break, i++) {
> >
> > The verifier recognizes that may_goto is used in the program,
> > reserves additional 8 bytes of stack, initializes them in subprog
> > prologue, and replaces may_goto instruction with:
> > aux_reg = *(u64 *)(fp - 40)
> > if aux_reg == 0 goto pc+off
> > aux_reg += 1
>
> `aux_reg -= 1`?

+1 for -1

>
> > *(u64 *)(fp - 40) = aux_reg
> >
> > may_goto instruction can be used by LLVM to implement __builtin_memcpy,
> > __builtin_strcmp.
> >
> > may_goto is not a full substitute for bpf_for() macro.
> > bpf_for() doesn't have induction variable that verifiers sees,
> > so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.
> >
> > But when the code is written as:
> > for (i = 0; i < 100; cond_break, i++)
> > the verifier see 'i' as precise constant zero,
> > hence cond_break (aka may_goto) doesn't help to converge the loop.
> > A static or global variable can be used as a workaround:
> > static int zero = 0;
> > for (i = zero; i < 100; cond_break, i++) // works!
> >
> > may_goto works well with arena pointers that don't need to be bounds-checked
> > on every iteration. Load/store from arena returns imprecise unbounded scalars.
> >
> > Reserve new opcode BPF_JMP | BPF_JMA for may_goto insn.
> > JMA stands for "jump maybe", and "jump multipurpose", and "jump multi always".
> > Since goto_or_nop insn was proposed, it may use the same opcode.
> > may_goto vs goto_or_nop can be distinguished by src_reg:
> > code = BPF_JMP | BPF_JMA:
> > src_reg = 0 - may_goto
> > src_reg = 1 - goto_or_nop
> > We could have reused BPF_JMP | BPF_JA like:
> > src_reg = 0 - normal goto
> > src_reg = 1 - may_goto
> > src_reg = 2 - goto_or_nop
> > but JA is a real insn and it's unconditional, while may_goto and goto_or_nop
> > are pseudo instructions, and both are conditional. Hence it's better to
> > have a different opcode for them. Hence BPF_JMA.
> >
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > ---
> >  include/linux/bpf_verifier.h   |   2 +
> >  include/uapi/linux/bpf.h       |   1 +
> >  kernel/bpf/core.c              |   1 +
> >  kernel/bpf/disasm.c            |   3 +
> >  kernel/bpf/verifier.c          | 156 ++++++++++++++++++++++++++-------
> >  tools/include/uapi/linux/bpf.h |   1 +
> >  6 files changed, 134 insertions(+), 30 deletions(-)
> >
>
> Not a huge fan of BPF_JMA, but there is no clear naming winner.
> BPF_JAUX, BPF_JPSEUDO, BPF_JMAYBE, would be a bit more
> greppable/recognizable, but it's not a big deal.

In the next version I'm planning to go with BPF_JCOND
to describe a new class of conditional pseudo jumps.
A comment hopefully will be good enough to avoid confusion
with existing conditional jumps (all of them except BPF_JA)
I will also add
enum {
 BPF_MAY_GOTO = 0,
};
and check that src_reg is equal to that.
Then in the future it can be extended with:
 BPF_NOP_OR_GOTO = 1,

> Left few nits below, but overall LGTM
>
> Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 84365e6dd85d..917ca603059b 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -449,6 +449,7 @@ struct bpf_verifier_state {
> >         u32 jmp_history_cnt;
> >         u32 dfs_depth;
> >         u32 callback_unroll_depth;
> > +       u32 may_goto_cnt;
>
> naming nit: seems like we consistently use "depth" terminology for
> bpf_loop and open-coded iters, any reason to deviate with "cnt"
> terminology here?

sure.

> >  };
> >
> >  #define bpf_get_spilled_reg(slot, frame, mask)                         \
>
> [...]
>
> >
> > +static bool is_may_goto_insn(struct bpf_verifier_env *env, int insn_idx)
> > +{
> > +       return env->prog->insnsi[insn_idx].code == (BPF_JMP | BPF_JMA);
> > +}
> > +
> >  /* process_iter_next_call() is called when verifier gets to iterator's next
> >   * "method" (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer
> >   * to it as just "iter_next()" in comments below.
> > @@ -14871,11 +14877,35 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> >         int err;
> >
> >         /* Only conditional jumps are expected to reach here. */
> > -       if (opcode == BPF_JA || opcode > BPF_JSLE) {
> > +       if (opcode == BPF_JA || opcode > BPF_JMA) {
> >                 verbose(env, "invalid BPF_JMP/JMP32 opcode %x\n", opcode);
> >                 return -EINVAL;
> >         }
> >
> > +       if (opcode == BPF_JMA) {
> > +               struct bpf_verifier_state *cur_st = env->cur_state, *queued_st, *prev_st;
> > +               int idx = *insn_idx;
> > +
> > +               if (insn->code != (BPF_JMP | BPF_JMA) ||
> > +                   insn->src_reg || insn->dst_reg || insn->imm || insn->off == 0) {
> > +                       verbose(env, "invalid may_goto off %d imm %d\n",
> > +                               insn->off, insn->imm);
> > +                       return -EINVAL;
> > +               }
> > +               prev_st = find_prev_entry(env, cur_st->parent, idx);
> > +
> > +               /* branch out 'fallthrough' insn as a new state to explore */
> > +               queued_st = push_stack(env, idx + 1, idx, false);
> > +               if (!queued_st)
> > +                       return -ENOMEM;
> > +
> > +               queued_st->may_goto_cnt++;
> > +               if (prev_st)
> > +                       widen_imprecise_scalars(env, prev_st, queued_st);
> > +               *insn_idx += insn->off;
> > +               return 0;
> > +       }
> > +
> >         /* check src2 operand */
> >         err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> >         if (err)
> > @@ -15659,6 +15689,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
> >         default:
> >                 /* conditional jump with two edges */
> >                 mark_prune_point(env, t);
> > +               if (insn->code == (BPF_JMP | BPF_JMA))
>
> maybe use is_may_goto_insn() here for consistency?

+1

> > +                       mark_force_checkpoint(env, t);
> >
> >                 ret = push_insn(t, t + 1, FALLTHROUGH, env);
> >                 if (ret)
>
> [...]
>
> >  patch_call_imm:
> >                 fn = env->ops->get_func_proto(insn->imm, env->prog);
> > @@ -19952,6 +20015,39 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                         return -EFAULT;
> >                 }
> >                 insn->imm = fn->func - __bpf_call_base;
> > +next_insn:
> > +               if (subprogs[cur_subprog + 1].start == i + delta + 1) {
> > +                       subprogs[cur_subprog].stack_depth += stack_depth_extra;
> > +                       subprogs[cur_subprog].stack_extra = stack_depth_extra;
> > +                       cur_subprog++;
> > +                       stack_depth = subprogs[cur_subprog].stack_depth;
> > +                       stack_depth_extra = 0;
> > +               }
> > +               i++; insn++;
>
> Is there a code path where we don't do i++, insn++? From cursory look
> at this loop, I think we always do this, so not sure why `i++, insn++`
> had to be moved from for() clause?

I was worried about missing replacing 'continue' with 'goto next_insn'.
Since 'continue' will work for all tests except may_goto tests,
and in arena patch set I have a hunk that is added to this loop too
and it is written with 'continue'.
Technically we can keep 'i++; insn++;' in the for()...
if we're going to be code reviewing any future additions carefully.
In other words 'stack_extra logic plus i and insn increments'
are part of the same logical block. It's an action that should
be done after each insn, hence better to keep them together.
Either inside for() as I did in v1/v2 or here toward the last '}'.
The latter is more canonical C.

>
> But if I missed it and we have to do these increments here, these are
> two separate statements, so let's put them on separate lines?

sure.

>
> > +       }
> > +
> > +       env->prog->aux->stack_depth = subprogs[0].stack_depth;
> > +       for (i = 0; i < env->subprog_cnt; i++) {
> > +               int subprog_start = subprogs[i].start, j;
> > +               int stack_slots = subprogs[i].stack_extra / 8;
> > +
> > +               if (stack_slots >= ARRAY_SIZE(insn_buf)) {
> > +                       verbose(env, "verifier bug: stack_extra is too large\n");
> > +                       return -EFAULT;
> > +               }
> > +
> > +               /* Add insns to subprog prologue to init extra stack */
> > +               for (j = 0; j < stack_slots; j++)
> > +                       insn_buf[j] = BPF_ST_MEM(BPF_DW, BPF_REG_FP,
> > +                                                -subprogs[i].stack_depth + j * 8, BPF_MAX_LOOPS);
> > +               if (j) {
> > +                       insn_buf[j] = env->prog->insnsi[subprog_start];
> > +
> > +                       new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, j + 1);
> > +                       if (!new_prog)
> > +                               return -ENOMEM;
> > +                       env->prog = prog = new_prog;
> > +               }
>
> this code is sort of generic (you don't assume just 0 or 1 extra
> slots), but then it initializes each extra slot with BPF_MAX_LOOPS,
> which doesn't look generic at all. So it's neither as simple as it
> could be nor generic, really...
>
> Maybe let's add WARN_ON if stack_extra>1 (so we catch it if we ever
> extend this), but otherwise just have a simple and easier to follow
>
> if (stack_slots) {
>     insn_buf[0] = BPF_ST_MEM(..., BPF_MAX_LOOPS);
>     /* bpf_patch_insn_data() replaces instruction,
>      * so we need to copy first actual insn to preserve it (it's not
> that obvious)
>      */
>     insn_buf[1] = env->prog->insnsi[subprog_start];
>     ... patch ...
> }
>
> It's pretty minor, overall, but definitely caused some pause for me.

I had it like that in v2, but then thought that I might
try to use this to fix tail_call from subprogs issue discussed
in the separate thread, so I changed it to generic zero init in v3,
but then went with BPF_MAX_LOOPS init in v4.
Looking at this now it seems hard coding to stack_slots==1 is better indeed.