On Tue, Mar 5, 2024 at 10:38 AM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Mon, Mar 4, 2024 at 8:52 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > Introduce may_goto instruction that acts on a hidden bpf_iter_num, so that > > bpf_iter_num_new(), bpf_iter_num_destroy() don't need to be called explicitly. > > bpf_iter_num was probably an inspiration, but I think by now the > analogy is pretty weak. bpf_iter_num_next() returns NULL or pointer to > int (i.e., it returns some usable value), while may_goto jumps or not. > So it's not just implicit new/destroy. The above doesn't confuse me, > but I wonder if someone less familiar with iterators would be confused > by the above? Agree. Will reword. > > > It can be used in any normal "for" or "while" loop, like > > > > for (i = zero; i < cnt; cond_break, i++) { > > > > The verifier recognizes that may_goto is used in the program, > > reserves additional 8 bytes of stack, initializes them in subprog > > prologue, and replaces may_goto instruction with: > > aux_reg = *(u64 *)(fp - 40) > > if aux_reg == 0 goto pc+off > > aux_reg += 1 > > `aux_reg -= 1`? +1 for -1 > > > *(u64 *)(fp - 40) = aux_reg > > > > may_goto instruction can be used by LLVM to implement __builtin_memcpy, > > __builtin_strcmp. > > > > may_goto is not a full substitute for bpf_for() macro. > > bpf_for() doesn't have induction variable that verifiers sees, > > so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded. > > > > But when the code is written as: > > for (i = 0; i < 100; cond_break, i++) > > the verifier see 'i' as precise constant zero, > > hence cond_break (aka may_goto) doesn't help to converge the loop. > > A static or global variable can be used as a workaround: > > static int zero = 0; > > for (i = zero; i < 100; cond_break, i++) // works! > > > > may_goto works well with arena pointers that don't need to be bounds-checked > > on every iteration. Load/store from arena returns imprecise unbounded scalars. > > > > Reserve new opcode BPF_JMP | BPF_JMA for may_goto insn. > > JMA stands for "jump maybe", and "jump multipurpose", and "jump multi always". > > Since goto_or_nop insn was proposed, it may use the same opcode. > > may_goto vs goto_or_nop can be distinguished by src_reg: > > code = BPF_JMP | BPF_JMA: > > src_reg = 0 - may_goto > > src_reg = 1 - goto_or_nop > > We could have reused BPF_JMP | BPF_JA like: > > src_reg = 0 - normal goto > > src_reg = 1 - may_goto > > src_reg = 2 - goto_or_nop > > but JA is a real insn and it's unconditional, while may_goto and goto_or_nop > > are pseudo instructions, and both are conditional. Hence it's better to > > have a different opcode for them. Hence BPF_JMA. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > --- > > include/linux/bpf_verifier.h | 2 + > > include/uapi/linux/bpf.h | 1 + > > kernel/bpf/core.c | 1 + > > kernel/bpf/disasm.c | 3 + > > kernel/bpf/verifier.c | 156 ++++++++++++++++++++++++++------- > > tools/include/uapi/linux/bpf.h | 1 + > > 6 files changed, 134 insertions(+), 30 deletions(-) > > > > Not a huge fan of BPF_JMA, but there is no clear naming winner. > BPF_JAUX, BPF_JPSEUDO, BPF_JMAYBE, would be a bit more > greppable/recognizable, but it's not a big deal. In the next version I'm planning to go with BPF_JCOND to describe a new class of conditional pseudo jumps. A comment hopefully will be good enough to avoid confusion with existing conditional jumps (all of them except BPF_JA) I will also add enum { BPF_MAY_GOTO = 0, }; and check that src_reg is equal to that. Then in the future it can be extended with: BPF_NOP_OR_GOTO = 1, > Left few nits below, but overall LGTM > > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > > index 84365e6dd85d..917ca603059b 100644 > > --- a/include/linux/bpf_verifier.h > > +++ b/include/linux/bpf_verifier.h > > @@ -449,6 +449,7 @@ struct bpf_verifier_state { > > u32 jmp_history_cnt; > > u32 dfs_depth; > > u32 callback_unroll_depth; > > + u32 may_goto_cnt; > > naming nit: seems like we consistently use "depth" terminology for > bpf_loop and open-coded iters, any reason to deviate with "cnt" > terminology here? sure. > > }; > > > > #define bpf_get_spilled_reg(slot, frame, mask) \ > > [...] > > > > > +static bool is_may_goto_insn(struct bpf_verifier_env *env, int insn_idx) > > +{ > > + return env->prog->insnsi[insn_idx].code == (BPF_JMP | BPF_JMA); > > +} > > + > > /* process_iter_next_call() is called when verifier gets to iterator's next > > * "method" (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer > > * to it as just "iter_next()" in comments below. > > @@ -14871,11 +14877,35 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, > > int err; > > > > /* Only conditional jumps are expected to reach here. */ > > - if (opcode == BPF_JA || opcode > BPF_JSLE) { > > + if (opcode == BPF_JA || opcode > BPF_JMA) { > > verbose(env, "invalid BPF_JMP/JMP32 opcode %x\n", opcode); > > return -EINVAL; > > } > > > > + if (opcode == BPF_JMA) { > > + struct bpf_verifier_state *cur_st = env->cur_state, *queued_st, *prev_st; > > + int idx = *insn_idx; > > + > > + if (insn->code != (BPF_JMP | BPF_JMA) || > > + insn->src_reg || insn->dst_reg || insn->imm || insn->off == 0) { > > + verbose(env, "invalid may_goto off %d imm %d\n", > > + insn->off, insn->imm); > > + return -EINVAL; > > + } > > + prev_st = find_prev_entry(env, cur_st->parent, idx); > > + > > + /* branch out 'fallthrough' insn as a new state to explore */ > > + queued_st = push_stack(env, idx + 1, idx, false); > > + if (!queued_st) > > + return -ENOMEM; > > + > > + queued_st->may_goto_cnt++; > > + if (prev_st) > > + widen_imprecise_scalars(env, prev_st, queued_st); > > + *insn_idx += insn->off; > > + return 0; > > + } > > + > > /* check src2 operand */ > > err = check_reg_arg(env, insn->dst_reg, SRC_OP); > > if (err) > > @@ -15659,6 +15689,8 @@ static int visit_insn(int t, struct bpf_verifier_env *env) > > default: > > /* conditional jump with two edges */ > > mark_prune_point(env, t); > > + if (insn->code == (BPF_JMP | BPF_JMA)) > > maybe use is_may_goto_insn() here for consistency? +1 > > + mark_force_checkpoint(env, t); > > > > ret = push_insn(t, t + 1, FALLTHROUGH, env); > > if (ret) > > [...] > > > patch_call_imm: > > fn = env->ops->get_func_proto(insn->imm, env->prog); > > @@ -19952,6 +20015,39 @@ static int do_misc_fixups(struct bpf_verifier_env *env) > > return -EFAULT; > > } > > insn->imm = fn->func - __bpf_call_base; > > +next_insn: > > + if (subprogs[cur_subprog + 1].start == i + delta + 1) { > > + subprogs[cur_subprog].stack_depth += stack_depth_extra; > > + subprogs[cur_subprog].stack_extra = stack_depth_extra; > > + cur_subprog++; > > + stack_depth = subprogs[cur_subprog].stack_depth; > > + stack_depth_extra = 0; > > + } > > + i++; insn++; > > Is there a code path where we don't do i++, insn++? From cursory look > at this loop, I think we always do this, so not sure why `i++, insn++` > had to be moved from for() clause? I was worried about missing replacing 'continue' with 'goto next_insn'. Since 'continue' will work for all tests except may_goto tests, and in arena patch set I have a hunk that is added to this loop too and it is written with 'continue'. Technically we can keep 'i++; insn++;' in the for()... if we're going to be code reviewing any future additions carefully. In other words 'stack_extra logic plus i and insn increments' are part of the same logical block. It's an action that should be done after each insn, hence better to keep them together. Either inside for() as I did in v1/v2 or here toward the last '}'. The latter is more canonical C. > > But if I missed it and we have to do these increments here, these are > two separate statements, so let's put them on separate lines? sure. > > > + } > > + > > + env->prog->aux->stack_depth = subprogs[0].stack_depth; > > + for (i = 0; i < env->subprog_cnt; i++) { > > + int subprog_start = subprogs[i].start, j; > > + int stack_slots = subprogs[i].stack_extra / 8; > > + > > + if (stack_slots >= ARRAY_SIZE(insn_buf)) { > > + verbose(env, "verifier bug: stack_extra is too large\n"); > > + return -EFAULT; > > + } > > + > > + /* Add insns to subprog prologue to init extra stack */ > > + for (j = 0; j < stack_slots; j++) > > + insn_buf[j] = BPF_ST_MEM(BPF_DW, BPF_REG_FP, > > + -subprogs[i].stack_depth + j * 8, BPF_MAX_LOOPS); > > + if (j) { > > + insn_buf[j] = env->prog->insnsi[subprog_start]; > > + > > + new_prog = bpf_patch_insn_data(env, subprog_start, insn_buf, j + 1); > > + if (!new_prog) > > + return -ENOMEM; > > + env->prog = prog = new_prog; > > + } > > this code is sort of generic (you don't assume just 0 or 1 extra > slots), but then it initializes each extra slot with BPF_MAX_LOOPS, > which doesn't look generic at all. So it's neither as simple as it > could be nor generic, really... > > Maybe let's add WARN_ON if stack_extra>1 (so we catch it if we ever > extend this), but otherwise just have a simple and easier to follow > > if (stack_slots) { > insn_buf[0] = BPF_ST_MEM(..., BPF_MAX_LOOPS); > /* bpf_patch_insn_data() replaces instruction, > * so we need to copy first actual insn to preserve it (it's not > that obvious) > */ > insn_buf[1] = env->prog->insnsi[subprog_start]; > ... patch ... > } > > It's pretty minor, overall, but definitely caused some pause for me. I had it like that in v2, but then thought that I might try to use this to fix tail_call from subprogs issue discussed in the separate thread, so I changed it to generic zero init in v3, but then went with BPF_MAX_LOOPS init in v4. Looking at this now it seems hard coding to stack_slots==1 is better indeed.