On Fri, Nov 4, 2022 at 4:27 PM Alexei Starovoitov <ast@xxxxxxxx> wrote: > > On 11/4/22 5:13 AM, KP Singh wrote: > > On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi > > <memxor@xxxxxxxxx> wrote: > >> > >> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote: > >>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID > >>> object access needing rcu_read_lock protection. The rcu protection > >>> is not needed for non-sleepable program. So various verification > >>> checking is only done for sleepable programs. In particular, only > >>> the following insns can be inside bpf_rcu_read_lock() region: > >>> - any non call insns except BPF_ABS/BPF_IND > >>> - non sleepable helpers and kfuncs. > >>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory > >>> allocation flag) should be GFP_ATOMIC. > >>> > >>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of > >>> this pointer and the load which gets this pointer needs to be > >>> protected by bpf_rcu_read_lock(). The following shows a couple > >>> of examples: > >>> struct task_struct { > >>> ... > >>> struct task_struct __rcu *real_parent; > >>> struct css_set __rcu *cgroups; > >>> ... > >>> }; > >>> struct css_set { > >>> ... > >>> struct cgroup *dfl_cgrp; > >>> ... > >>> } > >>> ... > >>> task = bpf_get_current_task_btf(); > >>> cgroups = task->cgroups; > >>> dfl_cgroup = cgroups->dfl_cgrp; > >>> ... using dfl_cgroup ... > >>> > >>> The bpf_rcu_read_lock/unlock() should be added like below to > >>> avoid verification failures. > >>> task = bpf_get_current_task_btf(); > >>> bpf_rcu_read_lock(); > >>> cgroups = task->cgroups; > >>> dfl_cgroup = cgroups->dfl_cgrp; > >>> bpf_rcu_read_unlock(); > >>> ... using dfl_cgroup ... > >>> > >>> The following is another example for task->real_parent. > >>> task = bpf_get_current_task_btf(); > >>> bpf_rcu_read_lock(); > >>> real_parent = task->real_parent; > >>> ... bpf_task_storage_get(&map, real_parent, 0, 0); > >>> bpf_rcu_read_unlock(); > >>> > >>> There is another case observed in selftest bpf_iter_ipv6_route.c: > >>> struct fib6_info *rt = ctx->rt; > >>> ... > >>> fib6_nh = &rt->fib6_nh[0]; // Not rcu protected > >>> ... > >>> if (rt->nh) > >>> fib6_nh = &nh->nh_info->fib6_nh; // rcu protected > >>> ... > >>> ... using fib6_nh ... > >>> Currently verification will fail with > >>> same insn cannot be used with different pointers > >>> since the use of fib6_nh is tag with rcu in one path > >>> but not in the other path. The above use case is a valid > >>> one so the verifier is changed to ignore MEM_RCU type tag > >>> in such cases. > >>> > >>> Signed-off-by: Yonghong Song <yhs@xxxxxx> > >>> --- > >>> include/linux/bpf.h | 3 + > >>> include/linux/bpf_verifier.h | 1 + > >>> kernel/bpf/btf.c | 11 +++ > >>> kernel/bpf/verifier.c | 126 ++++++++++++++++++++++++++++++++--- > >>> 4 files changed, 133 insertions(+), 8 deletions(-) > > > > [...] > > > >>> + > >> > >> This isn't right. Every load that obtains an RCU pointer needs to become tied to > >> the current RCU section, and needs to be invalidated once the RCU section ends. > >> > >> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access, > >> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called. > >> > >> Otherwise, with the current logic, the following would become possible: > >> > >> bpf_rcu_read_lock(); > >> p = rcu_dereference(foo->rcup); > >> bpf_rcu_read_unlock(); > >> > >> // p is possibly dead > >> > >> bpf_rcu_read_lock(); > >> // use p > >> bpf_rcu_read_unlock(); > >> > > > > What do want to do about cases like: > > > > bpf_rcu_read_lock(); > > > > q = rcu_derference(foo->rcup); > > > > bpf_rcu_read_lock(); > > This one should be rejected for simplicity. > Let's not complicated things with nested cs-s. Agreed, the current logic tries to count the number of active critical sections and the verifier should just reject if there is a nested bpf_rcu_read_lock() call. > > > > > p = rcu_derference(foo->rcup); > > > > bpf_rcu_read_unlock(); > > > > // Use q > > // Use p > > bpf_rcu_read_unlock(); > > > > I think this is probably implied in your statement but just making it clear, > > > > The invalidation needs to happen only when the outermost bpf_rcu_read_unlock > > is called. i.e. when active_rcu_lock goes back down to 0. > > >