On Fri, 2023-07-07 at 11:08 -0700, Andrii Nakryiko wrote: > On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@xxxxxxxxx> wrote: > > > > > > When it comes to fixing the problem, I don't quite know where to start. > > > Perhaps these iteration callbacks ought to be treated more like global functions > > > -- you can't always make assumptions about the state of the data in the context > > > pointer. Treating the context pointer as totally opaque seems bad from > > > a usability > > > perspective. Maybe there's a way to attempt to verify the function > > > body of the function > > > by treating all or part of the context as read-only, and then if that > > > fails, go back and > > > assume nothing about that part of the context structure. What is the > > > right way to > > > think about plugging this hole? > > > > 'treat as global' might be a way to fix it, but it will likely break > > some setups, since people passing pointers in a context and current > > global func verification doesn't support that. > > yeah, the intended use case is to return something from callbacks > through context pointer. So it will definitely break real uses. > > > I think the simplest fix would be to disallow writes into any pointers > > within a ctx. Writes to scalars should still be allowed. > > It might be a partial mitigation, but even with SCALARs there could be > problems due to assumed SCALAR range, which will be invalid if > callback is never called or called many times. > > > Much more complex fix would be to verify callbacks as > > process_iter_next_call(). See giant comment next to it. > > yep, that seems like the right way forward. We need to simulate 0, 1, > 2, .... executions of callbacks until we validate and exhaust all > possible context modification permutations, just like open-coded > iterators do it > > > But since we need a fix for stable I'd try the simple approach first. > > Could you try to implement that? Hi All, This issue seems stalled, so I took a look over the weekend. I have a work in progress fix, please let me know if you don't agree with direction I'm taking or if I'm stepping on anyone's toes. After some analysis I decided to go with Alexei's suggestion and implement something similar to iterators for selected set of helper functions that accept "looping" callbacks, such as: - bpf_loop - bpf_for_each_map_elem - bpf_user_ringbuf_drain - bpf_timer_set_callback (?) The sketch of the fix looks as follows: - extend struct bpf_func_state with callback iterator state information: - active/non-active flag - depth - r1-r5 state at the entry to callback; - extend __check_func_call() to setup callback iterator state when call to suitable helper function is processed; - extend BPF_EXIT processing (prepare_func_exit()) to queue new callback visit upon return from iterating callback (similar to process_iter_next_call()); - extend is_state_visited() to account for callback iterator hits in a way similar to regular iterators; - extend backtrack_insn() to correctly react to jumps from callback exit to callback entry. I have a patch (at the end of this email) that correctly recognizes the bpf_loop example in this thread as unsafe. However this patch has a few deficiencies: - verif_scale_strobemeta_bpf_loop selftest is not passing, because read_map_var function invoked from the callback continuously increments 'payload' pointer used in subsequent iterations. In order to handle such code I need to add an upper bound tracking for iteration depth (when such bound could be deduced). - loop detection is broken for simple callback as below: static int loop_callback_infinite(__u32 idx, __u64 *data) { for (;;) (*ctx)++; return 0; } To handle such code I need to change is_state_visited() to do callback iterator loop/hit detection on exit from callback (returns are not prune points at the moment), currently it is done on entry. - the callback as bellow currently causes state explosion: static int precise1_callback(__u32 idx, struct precise1_ctx *ctx) { if (idx == 0) ctx->a = 1; if (idx == 1 && ctx->a == 1) ctx->b = 1; return 0; } I'm not sure yet what to do about this, there are several possibilities: - tweak the order in which states are visited (need to think about it); - check states in bpf_verifier_env::head (not explored yet) for equivalence and avoid enqueuing duplicate states. I'll proceed addressing the issues above on Monday. Thanks, Eduard --- diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index a3236651ec64..5589f55e42ba 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -70,6 +70,17 @@ enum bpf_iter_state { BPF_ITER_STATE_DRAINED, }; +struct bpf_iter { + /* BTF container and BTF type ID describing + * struct bpf_iter_<type> of an iterator state + */ + struct btf *btf; + u32 btf_id; + /* packing following two fields to fit iter state into 16 bytes */ + enum bpf_iter_state state:2; + int depth:30; +}; + struct bpf_reg_state { /* Ordering of fields matters. See states_equal() */ enum bpf_reg_type type; @@ -115,16 +126,7 @@ struct bpf_reg_state { } dynptr; /* For bpf_iter stack slots */ - struct { - /* BTF container and BTF type ID describing - * struct bpf_iter_<type> of an iterator state - */ - struct btf *btf; - u32 btf_id; - /* packing following two fields to fit iter state into 16 bytes */ - enum bpf_iter_state state:2; - int depth:30; - } iter; + struct bpf_iter iter; /* Max size from any of the above. */ struct { @@ -300,6 +302,8 @@ struct bpf_func_state { bool in_callback_fn; struct tnum callback_ret_range; bool in_async_callback_fn; + struct bpf_iter callback_iter; + struct bpf_reg_state callback_entry_state[MAX_BPF_REG]; /* The following fields should be last. See copy_func_state() */ int acquired_refs; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index dbba2b806017..e79a4bec4461 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2357,6 +2357,7 @@ static void init_func_state(struct bpf_verifier_env *env, state->callback_ret_range = tnum_range(0, 0); init_reg_state(env, state); mark_verifier_state_scratched(env); + state->callback_iter.state = BPF_ITER_STATE_INVALID; } /* Similar to push_stack(), but for async callbacks */ @@ -3599,6 +3600,39 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, } } else if (opcode == BPF_EXIT) { bool r0_precise; + int subprog; + + /* Jump from 'exit' to start of the same subprogram, + * this happens for callback iteration, e.g.: + * + * int cb_func(...) { + * start: + * ... + * return 0; // <-- BPF_EXIT processing in do_check() + * // adds a state with IP set to 'start'. + * } + * bpf_loop(..., cb_func, ...); + * + * Clear r1-r5 as in the callback case above, but do + * not change frame number. + */ + if (subseq_idx >= 0 && + ((subprog = find_subprog(env, subseq_idx)) >= 0) && + idx >= subseq_idx && + (subprog >= env->subprog_cnt || + idx < env->subprog_info[subprog + 1].start)) { + if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) { + verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); + WARN_ONCE(1, "verifier backtracking bug"); + return -EFAULT; + } + if (bt_stack_mask(bt) != 0) + return -ENOTSUPP; + /* clear r1-r5 in callback subprog's mask */ + for (i = BPF_REG_1; i <= BPF_REG_5; i++) + bt_clear_reg(bt, i); + return 0; + } if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) { /* if backtracing was looking for registers R1-R5 @@ -8869,13 +8903,17 @@ static int set_callee_state(struct bpf_verifier_env *env, struct bpf_func_state *caller, struct bpf_func_state *callee, int insn_idx); +static int set_loop_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, int insn_idx); + static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx, int subprog, set_callee_state_fn set_callee_state_cb) { struct bpf_verifier_state *state = env->cur_state; struct bpf_func_state *caller, *callee; - int err; + int err, i; if (state->curframe + 1 >= MAX_CALL_FRAMES) { verbose(env, "the call stack of %d frames is too deep\n", @@ -8972,7 +9010,6 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn *insn_idx /* callsite */, state->curframe + 1 /* frameno within this callchain */, subprog /* subprog number within this prog */); - /* Transfer references to the callee */ err = copy_reference_state(callee, caller); if (err) @@ -8982,6 +9019,14 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn if (err) goto err_out; + if (set_callee_state_cb == set_loop_callback_state) { + callee->callback_iter.state = BPF_ITER_STATE_ACTIVE; + callee->callback_iter.depth = 0; + for (i = BPF_REG_0; i < MAX_BPF_REG; ++i) + copy_register_state(&callee->callback_entry_state[i], + &callee->regs[i]); + } + clear_caller_saved_regs(env, caller->regs); /* only increment it after check_reg_arg() finished */ @@ -9256,7 +9301,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) struct bpf_verifier_state *state = env->cur_state; struct bpf_func_state *caller, *callee; struct bpf_reg_state *r0; - int err; + int err, i; callee = state->frame[state->curframe]; r0 = &callee->regs[BPF_REG_0]; @@ -9301,6 +9346,25 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) return err; } + if (callee->in_callback_fn && callee->callback_iter.state == BPF_ITER_STATE_ACTIVE) { + struct bpf_verifier_state *queued_st; + struct bpf_func_state *queued_callee; + struct bpf_iter *queued_iter; + + queued_st = push_stack(env, env->subprog_info[callee->subprogno].start, + *insn_idx, false); + if (!queued_st) + return -ENOMEM; + + queued_callee = queued_st->frame[callee->frameno]; + queued_iter = &queued_callee->callback_iter; + queued_iter->depth++; + + for (i = BPF_REG_0; i < MAX_BPF_REG; ++i) + copy_register_state(&queued_callee->regs[i], + &queued_callee->callback_entry_state[i]); + } + *insn_idx = callee->callsite + 1; if (env->log.level & BPF_LOG_LEVEL) { verbose(env, "returning from callee:\n"); @@ -16112,6 +16176,15 @@ static bool is_iter_next_insn(struct bpf_verifier_env *env, int insn_idx) return env->insn_aux_data[insn_idx].is_iter_next; } +static bool is_callback_iter_entry(struct bpf_verifier_env *env, int insn_idx) +{ + struct bpf_func_state *cur_frame = cur_func(env); + + return cur_frame->in_callback_fn && + env->subprog_info[cur_frame->subprogno].start == insn_idx && + cur_frame->callback_iter.state != BPF_ITER_STATE_INVALID; +} + /* is_state_visited() handles iter_next() (see process_iter_next_call() for * terminology) calls specially: as opposed to bounded BPF loops, it *expects* * states to match, which otherwise would look like an infinite loop. So while @@ -16190,6 +16263,9 @@ static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf if (cur_slot->iter.depth != slot->iter.depth) return true; } + if (state->callback_iter.state == BPF_ITER_STATE_ACTIVE && + state->callback_iter.depth != cur->frame[fr]->callback_iter.depth) + return true; } return false; } @@ -16277,6 +16353,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) } goto skip_inf_loop_check; } + if (is_callback_iter_entry(env, insn_idx)) { + if (states_equal(env, &sl->state, cur) && + cur_func(env)->callback_iter.state == BPF_ITER_STATE_ACTIVE) + goto hit; + goto skip_inf_loop_check; + } /* attempt to detect infinite loop to avoid unnecessary doomed work */ if (states_maybe_looping(&sl->state, cur) && states_equal(env, &sl->state, cur) &&