On Wed, Nov 09, 2022 at 01:33:04AM IST, Yonghong Song wrote: > > > On 11/8/22 9:04 AM, Kumar Kartikeya Dwivedi wrote: > > On Tue, Nov 08, 2022 at 01:11:14PM IST, Yonghong Song wrote: > > > To simplify the design and support the common practice, no > > > nested bpf_rcu_read_lock() is allowed. During verification, > > > each paired bpf_rcu_read_lock()/unlock() has a unique > > > region id, starting from 1. Each rcu ptr register also > > > remembers the region id when the ptr reg is initialized. > > > The following is a simple example to illustrate the > > > rcu lock regions and usage of rcu ptr's. > > > > > > ... <=== rcu lock region 0 > > > bpf_rcu_read_lock() <=== rcu lock region 1 > > > rcu_ptr1 = ... <=== rcu_ptr1 with region 1 > > > ... using rcu_ptr1 ... > > > bpf_rcu_read_unlock() > > > ... <=== rcu lock region -1 > > > bpf_rcu_read_lock() <=== rcu lock region 2 > > > rcu_ptr2 = ... <=== rcu_ptr2 with region 2 > > > ... using rcu_ptr2 ... > > > ... using rcu_ptr1 ... <=== wrong, region 1 rcu_ptr in region 2 > > > bpf_rcu_read_unlock() > > > > > > Outside the rcu lock region, the rcu lock region id is 0 or negative of > > > previous valid rcu lock region id, so the next valid rcu lock region > > > id can be easily computed. > > > > > > Note that rcu protection is not needed for non-sleepable program. But > > > it is supported to make cross-sleepable/nonsleepable development easier. > > > For non-sleepable program, the following insns can be inside the rcu > > > lock region: > > > - any non call insns except BPF_ABS/BPF_IND > > > - non sleepable helpers or kfuncs > > > Also, bpf_*_storage_get() helper's 5th hidden argument (for memory > > > allocation flag) should be GFP_ATOMIC. > > > > > > If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of > > > this pointer and the load which gets this pointer needs to be > > > protected by bpf_rcu_read_lock(). The following shows a couple > > > of examples: > > > struct task_struct { > > > ... > > > struct task_struct __rcu *real_parent; > > > struct css_set __rcu *cgroups; > > > ... > > > }; > > > struct css_set { > > > ... > > > struct cgroup *dfl_cgrp; > > > ... > > > } > > > ... > > > task = bpf_get_current_task_btf(); > > > cgroups = task->cgroups; > > > dfl_cgroup = cgroups->dfl_cgrp; > > > ... using dfl_cgroup ... > > > > > > The bpf_rcu_read_lock/unlock() should be added like below to > > > avoid verification failures. > > > task = bpf_get_current_task_btf(); > > > bpf_rcu_read_lock(); > > > cgroups = task->cgroups; > > > dfl_cgroup = cgroups->dfl_cgrp; > > > bpf_rcu_read_unlock(); > > > ... using dfl_cgroup ... > > > > > > The following is another example for task->real_parent. > > > task = bpf_get_current_task_btf(); > > > bpf_rcu_read_lock(); > > > real_parent = task->real_parent; > > > ... bpf_task_storage_get(&map, real_parent, 0, 0); > > > bpf_rcu_read_unlock(); > > > > > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > > > --- > > > include/linux/bpf.h | 1 + > > > include/linux/bpf_verifier.h | 7 +++ > > > kernel/bpf/btf.c | 32 ++++++++++++- > > > kernel/bpf/verifier.c | 92 +++++++++++++++++++++++++++++++----- > > > 4 files changed, 120 insertions(+), 12 deletions(-) > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > index b4bbcafd1c9b..98af0c9ec721 100644 > > > --- a/include/linux/bpf.h > > > +++ b/include/linux/bpf.h > > > @@ -761,6 +761,7 @@ struct bpf_prog_ops { > > > struct btf_struct_access_info { > > > u32 next_btf_id; > > > enum bpf_type_flag flag; > > > + bool is_rcu; > > > }; > > > > > > struct bpf_verifier_ops { > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > > > index 1a32baa78ce2..5d703637bb12 100644 > > > --- a/include/linux/bpf_verifier.h > > > +++ b/include/linux/bpf_verifier.h > > > @@ -179,6 +179,10 @@ struct bpf_reg_state { > > > */ > > > s32 subreg_def; > > > enum bpf_reg_liveness live; > > > + /* 0: not rcu ptr; > 0: rcu ptr, id of the rcu read lock region where > > > + * the rcu ptr reg is initialized. > > > + */ > > > + int active_rcu_lock; > > > /* if (!precise && SCALAR_VALUE) min/max/tnum don't affect safety */ > > > bool precise; > > > }; > > > @@ -324,6 +328,8 @@ struct bpf_verifier_state { > > > u32 insn_idx; > > > u32 curframe; > > > u32 active_spin_lock; > > > + /* <= 0: not in rcu read lock region; > 0: the rcu lock region id */ > > > + int active_rcu_lock; > > > bool speculative; > > > > > > /* first and last insn idx of this verifier state */ > > > @@ -424,6 +430,7 @@ struct bpf_insn_aux_data { > > > u32 seen; /* this insn was processed by the verifier at env->pass_cnt */ > > > bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */ > > > bool zext_dst; /* this insn zero extends dst reg */ > > > + bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */ > > > u8 alu_state; /* used in combination with alu_limit */ > > > > > > /* below fields are initialized once */ > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > > > index d2ee1669a2f3..c5a9569f2ae0 100644 > > > --- a/kernel/bpf/btf.c > > > +++ b/kernel/bpf/btf.c > > > @@ -5831,6 +5831,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf, > > > if (btf_type_is_ptr(mtype)) { > > > const struct btf_type *stype, *t; > > > enum bpf_type_flag tmp_flag = 0; > > > + bool is_rcu = false; > > > u32 id; > > > > > > if (msize != size || off != moff) { > > > @@ -5850,12 +5851,16 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf, > > > /* check __percpu tag */ > > > if (strcmp(tag_value, "percpu") == 0) > > > tmp_flag = MEM_PERCPU; > > > + /* check __rcu tag */ > > > + if (strcmp(tag_value, "rcu") == 0) > > > + is_rcu = true; > > > } > > > > > > stype = btf_type_skip_modifiers(btf, mtype->type, &id); > > > if (btf_type_is_struct(stype)) { > > > info->next_btf_id = id; > > > info->flag = tmp_flag; > > > + info->is_rcu = is_rcu; > > > return WALK_PTR; > > > } > > > } > > > @@ -6317,7 +6322,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, > > > { > > > enum bpf_prog_type prog_type = resolve_prog_type(env->prog); > > > bool rel = false, kptr_get = false, trusted_args = false; > > > - bool sleepable = false; > > > + bool sleepable = false, rcu_lock = false, rcu_unlock = false; > > > struct bpf_verifier_log *log = &env->log; > > > u32 i, nargs, ref_id, ref_obj_id = 0; > > > bool is_kfunc = btf_is_kernel(btf); > > > @@ -6356,6 +6361,31 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, > > > kptr_get = kfunc_meta->flags & KF_KPTR_GET; > > > trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS; > > > sleepable = kfunc_meta->flags & KF_SLEEPABLE; > > > + rcu_lock = kfunc_meta->flags & KF_RCU_LOCK; > > > + rcu_unlock = kfunc_meta->flags & KF_RCU_UNLOCK; > > > + } > > > + > > > + /* checking rcu read lock/unlock */ > > > + if (env->cur_state->active_rcu_lock > 0) { > > > + if (rcu_lock) { > > > + bpf_log(log, "nested rcu read lock (kernel function %s)\n", func_name); > > > + return -EINVAL; > > > + } else if (rcu_unlock) { > > > + /* change active_rcu_lock to its corresponding negative value to > > > + * preserve the previous lock region id. > > > + */ > > > + env->cur_state->active_rcu_lock = -env->cur_state->active_rcu_lock; > > > + } else if (sleepable) { > > > + bpf_log(log, "kernel func %s is sleepable within rcu_read_lock region\n", > > > + func_name); > > > + return -EINVAL; > > > + } > > > + } else if (rcu_lock) { > > > + /* a new lock region started, increase the region id. */ > > > + env->cur_state->active_rcu_lock = (-env->cur_state->active_rcu_lock) + 1; > > > + } else if (rcu_unlock) { > > > + bpf_log(log, "unmatched rcu read unlock (kernel function %s)\n", func_name); > > > + return -EINVAL; > > > } > > > > > > > Can you provide more context on why having ids is better than simply > > invalidating the registers when the section ends, and making active_rcu_lock a > > boolean instead? You can use bpf_for_each_reg_in_vstate to find every reg having > > MEM_RCU and mark it unknown. > > I think we also need to invalidate rcu-ptr related states as well in spills. > > I also tried to support cases like: > bpf_rcu_read_lock(); > rcu_ptr = ... > ... rcu_ptr ... > bpf_rcu_read_unlock(); > ... rcu_ptr ... /* no load, just use the rcu_ptr somehow */ > > In the above case, outside the rcu read lock region, there is no > load with rcu_ptr but it can still be used for other purposes > with a property of a pointer. > > But for a second thought, it should be okay to invalidate > rcu_ptr during bpf_rcu_read_unlock() as a scalar. This should > satisfy almost all (if not all) cases. > > > > > You won't have to match the id in btf_struct_access as such registers won't ever > > reach that function (if marked unknown on invalidation, they become scalars). > > The reg state won't need another active_rcu_lock member either, it is simply > > part of reg->type. > > Right, if I don't maintain region id's, no need to have reg->active_rcu_lock > and using MEM_RCU should be enough. > > > > > It seems to that simply invalidating registers when rcu_read_unlock is called is > > both less code to write and simpler to understand. > > invalidating rcu_ptr in registers and spills. > If you use bpf_for_each_reg_in_vstate, it should cover both.