Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support

KP Singh <kpsingh@xxxxxxxxxx> · Fri, 4 Nov 2022 16:32:31 +0100

On Fri, Nov 4, 2022 at 4:27 PM Alexei Starovoitov <ast@xxxxxxxx> wrote:
>
> On 11/4/22 5:13 AM, KP Singh wrote:
> > On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi
> > <memxor@xxxxxxxxx> wrote:
> >>
> >> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
> >>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
> >>> object access needing rcu_read_lock protection. The rcu protection
> >>> is not needed for non-sleepable program. So various verification
> >>> checking is only done for sleepable programs. In particular, only
> >>> the following insns can be inside bpf_rcu_read_lock() region:
> >>>    - any non call insns except BPF_ABS/BPF_IND
> >>>    - non sleepable helpers and kfuncs.
> >>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
> >>> allocation flag) should be GFP_ATOMIC.
> >>>
> >>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
> >>> this pointer and the load which gets this pointer needs to be
> >>> protected by bpf_rcu_read_lock(). The following shows a couple
> >>> of examples:
> >>>    struct task_struct {
> >>>        ...
> >>>        struct task_struct __rcu        *real_parent;
> >>>        struct css_set __rcu            *cgroups;
> >>>        ...
> >>>    };
> >>>    struct css_set {
> >>>        ...
> >>>        struct cgroup *dfl_cgrp;
> >>>        ...
> >>>    }
> >>>    ...
> >>>    task = bpf_get_current_task_btf();
> >>>    cgroups = task->cgroups;
> >>>    dfl_cgroup = cgroups->dfl_cgrp;
> >>>    ... using dfl_cgroup ...
> >>>
> >>> The bpf_rcu_read_lock/unlock() should be added like below to
> >>> avoid verification failures.
> >>>    task = bpf_get_current_task_btf();
> >>>    bpf_rcu_read_lock();
> >>>    cgroups = task->cgroups;
> >>>    dfl_cgroup = cgroups->dfl_cgrp;
> >>>    bpf_rcu_read_unlock();
> >>>    ... using dfl_cgroup ...
> >>>
> >>> The following is another example for task->real_parent.
> >>>    task = bpf_get_current_task_btf();
> >>>    bpf_rcu_read_lock();
> >>>    real_parent = task->real_parent;
> >>>    ... bpf_task_storage_get(&map, real_parent, 0, 0);
> >>>    bpf_rcu_read_unlock();
> >>>
> >>> There is another case observed in selftest bpf_iter_ipv6_route.c:
> >>>    struct fib6_info *rt = ctx->rt;
> >>>    ...
> >>>    fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
> >>>    ...
> >>>    if (rt->nh)
> >>>      fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
> >>>    ...
> >>>    ... using fib6_nh ...
> >>> Currently verification will fail with
> >>>    same insn cannot be used with different pointers
> >>> since the use of fib6_nh is tag with rcu in one path
> >>> but not in the other path. The above use case is a valid
> >>> one so the verifier is changed to ignore MEM_RCU type tag
> >>> in such cases.
> >>>
> >>> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> >>> ---
> >>>   include/linux/bpf.h          |   3 +
> >>>   include/linux/bpf_verifier.h |   1 +
> >>>   kernel/bpf/btf.c             |  11 +++
> >>>   kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
> >>>   4 files changed, 133 insertions(+), 8 deletions(-)
> >
> > [...]
> >
> >>> +
> >>
> >> This isn't right. Every load that obtains an RCU pointer needs to become tied to
> >> the current RCU section, and needs to be invalidated once the RCU section ends.
> >>
> >> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
> >> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
> >>
> >> Otherwise, with the current logic, the following would become possible:
> >>
> >> bpf_rcu_read_lock();
> >> p = rcu_dereference(foo->rcup);
> >> bpf_rcu_read_unlock();
> >>
> >> // p is possibly dead
> >>
> >> bpf_rcu_read_lock();
> >> // use p
> >> bpf_rcu_read_unlock();
> >>
> >
> > What do want to do about cases like:
> >
> > bpf_rcu_read_lock();
> >
> > q = rcu_derference(foo->rcup);
> >
> > bpf_rcu_read_lock();
>
> This one should be rejected for simplicity.
> Let's not complicated things with nested cs-s.

Agreed, the current logic tries to count the number of active
critical sections and the verifier should just reject if there
is a nested bpf_rcu_read_lock() call.

>
> >
> > p = rcu_derference(foo->rcup);
> >
> > bpf_rcu_read_unlock();
> >
> > // Use q
> > // Use p
> > bpf_rcu_read_unlock();
> >
> > I think this is probably implied in your statement but just making it clear,
> >
> > The invalidation needs to happen only when the outermost bpf_rcu_read_unlock
> > is called. i.e. when active_rcu_lock goes back down to 0.
> >
>