Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 22, 2023 at 5:18 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
>
> On 8/22/23 4:45 PM, Alexei Starovoitov wrote:
> > On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
> >> On 8/21/23 10:37 PM, Yonghong Song wrote:
> >>>
> >>>
> >>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> >>>> An earlier patch in the series ensures that the underlying memory of
> >>>> nodes with bpf_refcount - which can have multiple owners - is not reused
> >>>> until RCU grace period has elapsed. This prevents
> >>>> use-after-free with non-owning references that may point to
> >>>> recently-freed memory. While RCU read lock is held, it's safe to
> >>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
> >>>> elapsed and therefore underlying memory couldn't have been reused.
> >>>>
> >>>>   From the perspective of verifier "trustedness" non-owning refs to
> >>>> refcounted nodes are now trusted only in RCU CS and therefore should no
> >>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
> >>>> MEM_RCU in order to reflect this new state.
> >>>>
> >>>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
> >>>> ---
> >>>>    include/linux/bpf.h   |  3 ++-
> >>>>    kernel/bpf/verifier.c | 13 ++++++++++++-
> >>>>    2 files changed, 14 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >>>> index eced6400f778..12596af59c00 100644
> >>>> --- a/include/linux/bpf.h
> >>>> +++ b/include/linux/bpf.h
> >>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
> >>>>        MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
> >>>>          /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
> >>>> -     * Currently only valid for linked-list and rbtree nodes.
> >>>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
> >>>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
> >>>>         */
> >>>>        NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
> >>>>    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>>> index 8db0afa5985c..55607ab30522 100644
> >>>> --- a/kernel/bpf/verifier.c
> >>>> +++ b/kernel/bpf/verifier.c
> >>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >>>>        case PTR_TO_BTF_ID | PTR_TRUSTED:
> >>>>        case PTR_TO_BTF_ID | MEM_RCU:
> >>>>        case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
> >>>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
> >>>>            /* When referenced PTR_TO_BTF_ID is passed to release function,
> >>>>             * its fixed offset must be 0. In the other cases, fixed offset
> >>>>             * can be non-zero. This was already checked above. So pass
> >>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> >>>>    static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> >>>>    {
> >>>>        struct bpf_verifier_state *state = env->cur_state;
> >>>> +    struct btf_record *rec = reg_btf_record(reg);
> >>>>          if (!state->active_lock.ptr) {
> >>>>            verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
> >>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
> >>>>        }
> >>>>          reg->type |= NON_OWN_REF;
> >>>> +    if (rec->refcount_off >= 0)
> >>>> +        reg->type |= MEM_RCU;
> >>>
> >>> Should the above MEM_RCU marking be done unless reg access is in
> >>> rcu critical section?
> >>
> >> I think it is fine, since non-owning references currently exist only within
> >> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
> >> disabled + spin_lock CS should imply RCU CS.
> >>
> >>    [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/
> >>
> >>>
> >>> I think we still have issues for state resetting
> >>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
> >>> will try to convert the reg state to PTR_UNTRUSTED.
> >>>
> >>> Let us say reg state is
> >>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
> >>>
> >>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
> >>> the reg state, the state should become
> >>>    PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
> >>> some additional code might be needed so we wont have
> >>> verifier complaints about ref_obj_id == 0.
> >>>
> >>> (2). If hitting bpf_rcu_read_unlock(), the state should become
> >>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
> >>> since register access still in bpf_spin_lock() region.
> >>
> >> I agree w/ your comment in side reply stating that this
> >> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
> >> is currently not allowed.
> >>
> >>>
> >>> Does this make sense?
> >>>
> >>
> >>
> >> IIUC the specific reg state flow you're recommending is based on the convos
> >> we've had over the past few weeks re: getting rid of special non-owning ref
> >> lifetime rules, instead using RCU as much as possible. Specifically, this
> >> recommended change would remove non-owning ref clobbering, instead just removing
> >> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
> >> to collection kfuncs (refcount_acquire, etc).
> >
> > Overall the patch set makes sense to me, but I want to clarify above.
> > My understanding that after the patch set applied bpf_spin_unlock()
> > will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
> > is not correct.
> > Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().
>
> I said it 'should become ...', but you are right. right now, it will do
> mark_reg_invalid(). So it is correct just MAYBE a little conservative.

Ahh. You mean that it should be fixed to do that. Got it.
non_own_ref after spin_unlock should become a pure mem_rcu pointer.
Need to think it through. Probably correct.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux