On Tue, Aug 22, 2023 at 5:18 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > > On 8/22/23 4:45 PM, Alexei Starovoitov wrote: > > On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote: > >> On 8/21/23 10:37 PM, Yonghong Song wrote: > >>> > >>> > >>> On 8/21/23 12:33 PM, Dave Marchevsky wrote: > >>>> An earlier patch in the series ensures that the underlying memory of > >>>> nodes with bpf_refcount - which can have multiple owners - is not reused > >>>> until RCU grace period has elapsed. This prevents > >>>> use-after-free with non-owning references that may point to > >>>> recently-freed memory. While RCU read lock is held, it's safe to > >>>> dereference such a non-owning ref, as by definition RCU GP couldn't have > >>>> elapsed and therefore underlying memory couldn't have been reused. > >>>> > >>>> From the perspective of verifier "trustedness" non-owning refs to > >>>> refcounted nodes are now trusted only in RCU CS and therefore should no > >>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them > >>>> MEM_RCU in order to reflect this new state. > >>>> > >>>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> > >>>> --- > >>>> include/linux/bpf.h | 3 ++- > >>>> kernel/bpf/verifier.c | 13 ++++++++++++- > >>>> 2 files changed, 14 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > >>>> index eced6400f778..12596af59c00 100644 > >>>> --- a/include/linux/bpf.h > >>>> +++ b/include/linux/bpf.h > >>>> @@ -653,7 +653,8 @@ enum bpf_type_flag { > >>>> MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS), > >>>> /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning. > >>>> - * Currently only valid for linked-list and rbtree nodes. > >>>> + * Currently only valid for linked-list and rbtree nodes. If the nodes > >>>> + * have a bpf_refcount_field, they must be tagged MEM_RCU as well. > >>>> */ > >>>> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS), > >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >>>> index 8db0afa5985c..55607ab30522 100644 > >>>> --- a/kernel/bpf/verifier.c > >>>> +++ b/kernel/bpf/verifier.c > >>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, > >>>> case PTR_TO_BTF_ID | PTR_TRUSTED: > >>>> case PTR_TO_BTF_ID | MEM_RCU: > >>>> case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF: > >>>> + case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU: > >>>> /* When referenced PTR_TO_BTF_ID is passed to release function, > >>>> * its fixed offset must be 0. In the other cases, fixed offset > >>>> * can be non-zero. This was already checked above. So pass > >>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env, > >>>> static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > >>>> { > >>>> struct bpf_verifier_state *state = env->cur_state; > >>>> + struct btf_record *rec = reg_btf_record(reg); > >>>> if (!state->active_lock.ptr) { > >>>> verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n"); > >>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state > >>>> } > >>>> reg->type |= NON_OWN_REF; > >>>> + if (rec->refcount_off >= 0) > >>>> + reg->type |= MEM_RCU; > >>> > >>> Should the above MEM_RCU marking be done unless reg access is in > >>> rcu critical section? > >> > >> I think it is fine, since non-owning references currently exist only within > >> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption > >> disabled + spin_lock CS should imply RCU CS. > >> > >> [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/ > >> > >>> > >>> I think we still have issues for state resetting > >>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which > >>> will try to convert the reg state to PTR_UNTRUSTED. > >>> > >>> Let us say reg state is > >>> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU > >>> > >>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in > >>> the reg state, the state should become > >>> PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU > >>> some additional code might be needed so we wont have > >>> verifier complaints about ref_obj_id == 0. > >>> > >>> (2). If hitting bpf_rcu_read_unlock(), the state should become > >>> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF > >>> since register access still in bpf_spin_lock() region. > >> > >> I agree w/ your comment in side reply stating that this > >> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS > >> is currently not allowed. > >> > >>> > >>> Does this make sense? > >>> > >> > >> > >> IIUC the specific reg state flow you're recommending is based on the convos > >> we've had over the past few weeks re: getting rid of special non-owning ref > >> lifetime rules, instead using RCU as much as possible. Specifically, this > >> recommended change would remove non-owning ref clobbering, instead just removing > >> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed > >> to collection kfuncs (refcount_acquire, etc). > > > > Overall the patch set makes sense to me, but I want to clarify above. > > My understanding that after the patch set applied bpf_spin_unlock() > > will invalidate_non_owning_refs(), so what Yonghong is saying in (1) > > is not correct. > > Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid(). > > I said it 'should become ...', but you are right. right now, it will do > mark_reg_invalid(). So it is correct just MAYBE a little conservative. Ahh. You mean that it should be fixed to do that. Got it. non_own_ref after spin_unlock should become a pure mem_rcu pointer. Need to think it through. Probably correct.