On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote: > On 8/21/23 10:37 PM, Yonghong Song wrote: > > > > > > On 8/21/23 12:33 PM, Dave Marchevsky wrote: > >> An earlier patch in the series ensures that the underlying memory of > >> nodes with bpf_refcount - which can have multiple owners - is not reused > >> until RCU grace period has elapsed. This prevents > >> use-after-free with non-owning references that may point to > >> recently-freed memory. While RCU read lock is held, it's safe to > >> dereference such a non-owning ref, as by definition RCU GP couldn't have > >> elapsed and therefore underlying memory couldn't have been reused. > >> > >> From the perspective of verifier "trustedness" non-owning refs to > >> refcounted nodes are now trusted only in RCU CS and therefore should no > >> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them > >> MEM_RCU in order to reflect this new state. > >> > >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> > >> --- > >> include/linux/bpf.h | 3 ++- > >> kernel/bpf/verifier.c | 13 ++++++++++++- > >> 2 files changed, 14 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > >> index eced6400f778..12596af59c00 100644 > >> --- a/include/linux/bpf.h > >> +++ b/include/linux/bpf.h > >> @@ -653,7 +653,8 @@ enum bpf_type_flag { > >> MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS), > >> /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning. > >> - * Currently only valid for linked-list and rbtree nodes. > >> + * Currently only valid for linked-list and rbtree nodes. If the nodes > >> + * have a bpf_refcount_field, they must be tagged MEM_RCU as well. > >> */ > >> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS), > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 8db0afa5985c..55607ab30522 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, > >> case PTR_TO_BTF_ID | PTR_TRUSTED: > >> case PTR_TO_BTF_ID | MEM_RCU: > >> case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF: > >> + case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU: > >> /* When referenced PTR_TO_BTF_ID is passed to release function, > >> * its fixed offset must be 0. In the other cases, fixed offset > >> * can be non-zero. This was already checked above. So pass > >> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env, > >> static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > >> { > >> struct bpf_verifier_state *state = env->cur_state; > >> + struct btf_record *rec = reg_btf_record(reg); > >> if (!state->active_lock.ptr) { > >> verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n"); > >> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state > >> } > >> reg->type |= NON_OWN_REF; > >> + if (rec->refcount_off >= 0) > >> + reg->type |= MEM_RCU; > > > > Should the above MEM_RCU marking be done unless reg access is in > > rcu critical section? > > I think it is fine, since non-owning references currently exist only within > spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption > disabled + spin_lock CS should imply RCU CS. > > [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/ > > > > > I think we still have issues for state resetting > > with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which > > will try to convert the reg state to PTR_UNTRUSTED. > > > > Let us say reg state is > > PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU > > > > (1). If hitting bpf_spin_unlock(), since MEM_RCU is in > > the reg state, the state should become > > PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU > > some additional code might be needed so we wont have > > verifier complaints about ref_obj_id == 0. > > > > (2). If hitting bpf_rcu_read_unlock(), the state should become > > PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF > > since register access still in bpf_spin_lock() region. > > I agree w/ your comment in side reply stating that this > case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS > is currently not allowed. > > > > > Does this make sense? > > > > > IIUC the specific reg state flow you're recommending is based on the convos > we've had over the past few weeks re: getting rid of special non-owning ref > lifetime rules, instead using RCU as much as possible. Specifically, this > recommended change would remove non-owning ref clobbering, instead just removing > NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed > to collection kfuncs (refcount_acquire, etc). Overall the patch set makes sense to me, but I want to clarify above. My understanding that after the patch set applied bpf_spin_unlock() will invalidate_non_owning_refs(), so what Yonghong is saying in (1) is not correct. Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid(). Re: (2) even if/when bpf_rcu_read_unlock() will allowed inside spinlocked region it will convert PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU to PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | PTR_UNTRUSTED which is a buggy combination which we would need to address if rcu_unlock is allowed eventually. Did I get it right? If so I think the whole set is good to do.