On 8/21/23 10:37 PM, Yonghong Song wrote: > > > On 8/21/23 12:33 PM, Dave Marchevsky wrote: >> An earlier patch in the series ensures that the underlying memory of >> nodes with bpf_refcount - which can have multiple owners - is not reused >> until RCU grace period has elapsed. This prevents >> use-after-free with non-owning references that may point to >> recently-freed memory. While RCU read lock is held, it's safe to >> dereference such a non-owning ref, as by definition RCU GP couldn't have >> elapsed and therefore underlying memory couldn't have been reused. >> >> From the perspective of verifier "trustedness" non-owning refs to >> refcounted nodes are now trusted only in RCU CS and therefore should no >> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them >> MEM_RCU in order to reflect this new state. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> >> --- >> include/linux/bpf.h | 3 ++- >> kernel/bpf/verifier.c | 13 ++++++++++++- >> 2 files changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index eced6400f778..12596af59c00 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -653,7 +653,8 @@ enum bpf_type_flag { >> MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS), >> /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning. >> - * Currently only valid for linked-list and rbtree nodes. >> + * Currently only valid for linked-list and rbtree nodes. If the nodes >> + * have a bpf_refcount_field, they must be tagged MEM_RCU as well. >> */ >> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS), >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 8db0afa5985c..55607ab30522 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, >> case PTR_TO_BTF_ID | PTR_TRUSTED: >> case PTR_TO_BTF_ID | MEM_RCU: >> case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF: >> + case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU: >> /* When referenced PTR_TO_BTF_ID is passed to release function, >> * its fixed offset must be 0. In the other cases, fixed offset >> * can be non-zero. This was already checked above. So pass >> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env, >> static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg) >> { >> struct bpf_verifier_state *state = env->cur_state; >> + struct btf_record *rec = reg_btf_record(reg); >> if (!state->active_lock.ptr) { >> verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n"); >> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state >> } >> reg->type |= NON_OWN_REF; >> + if (rec->refcount_off >= 0) >> + reg->type |= MEM_RCU; > > Should the above MEM_RCU marking be done unless reg access is in > rcu critical section? I think it is fine, since non-owning references currently exist only within spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption disabled + spin_lock CS should imply RCU CS. [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/ > > I think we still have issues for state resetting > with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which > will try to convert the reg state to PTR_UNTRUSTED. > > Let us say reg state is > PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU > > (1). If hitting bpf_spin_unlock(), since MEM_RCU is in > the reg state, the state should become > PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU > some additional code might be needed so we wont have > verifier complaints about ref_obj_id == 0. > > (2). If hitting bpf_rcu_read_unlock(), the state should become > PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF > since register access still in bpf_spin_lock() region. I agree w/ your comment in side reply stating that this case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS is currently not allowed. > > Does this make sense? > IIUC the specific reg state flow you're recommending is based on the convos we've had over the past few weeks re: getting rid of special non-owning ref lifetime rules, instead using RCU as much as possible. Specifically, this recommended change would remove non-owning ref clobbering, instead just removing NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed to collection kfuncs (refcount_acquire, etc). I agree that in general we'll be able to loosen the lifetime logic for non-owning refs, and your specific suggestion sounds reasonable. IMO it's better to ship that as a separate series, though, as this series was meant to be the minimum changes necessary to re-enable bpf_refcount_acquire, and it's expanded a bit past that already. Easier to reason about the rest of this series' changes without having to account for clobbering changes. >> + >> return 0; >> } >> @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >> struct bpf_func_state *state; >> struct bpf_reg_state *reg; >> + if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) { >> + verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n"); >> + return -EACCES; >> + } >> + >> if (rcu_lock) { >> verbose(env, "nested rcu read lock (kernel function %s)\n", func_name); >> return -EINVAL; >> @@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env) >> return -EINVAL; >> } >> - if (env->cur_state->active_rcu_lock) { >> + if (env->cur_state->active_rcu_lock && >> + !in_rbtree_lock_required_cb(env)) { >> verbose(env, "bpf_rcu_read_unlock is missing\n"); >> return -EINVAL; >> }