On Fri, Jan 19, 2024 at 6:27 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 1/16/24 6:48 PM, Yafang Shao wrote: > > Add three new kfuncs for bpf_iter_cpumask. > > - bpf_iter_cpumask_new > > It is defined with KF_RCU_PROTECTED and KF_RCU. > > KF_RCU_PROTECTED is defined because we must use it under the > > protection of RCU. > > KF_RCU is defined because the cpumask must be a RCU trusted pointer > > such as task->cpus_ptr. > > I am not sure whether we need both or not. > > KF_RCU_PROTECTED means the function call needs within the rcu cs. > KF_RCU means the argument usage needs within the rcu cs. > We only need one of them (preferrably KF_RCU). As you explained below, KF_RCU_PROTECTED is actually for bpf_iter_cpumask_next(). > > > - bpf_iter_cpumask_next > > - bpf_iter_cpumask_destroy > > > > These new kfuncs facilitate the iteration of percpu data, such as > > runqueues, psi_cgroup_cpu, and more. > > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > --- > > kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 69 insertions(+) > > > > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c > > index 2e73533a3811..1840e48e6142 100644 > > --- a/kernel/bpf/cpumask.c > > +++ b/kernel/bpf/cpumask.c > > @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask) > > return cpumask_weight(cpumask); > > } > > > > +struct bpf_iter_cpumask { > > + __u64 __opaque[2]; > > +} __aligned(8); > > + > > +struct bpf_iter_cpumask_kern { > > + const struct cpumask *mask; > > + int cpu; > > +} __aligned(8); > > + > > +/** > > + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a specified cpumask > > + * @it: The new bpf_iter_cpumask to be created. > > + * @mask: The cpumask to be iterated over. > > + * > > + * This function initializes a new bpf_iter_cpumask structure for iterating over > > + * the specified CPU mask. It assigns the provided cpumask to the newly created > > + * bpf_iter_cpumask @it for subsequent iteration operations. > > + * > > + * On success, 0 is returen. On failure, ERR is returned. > > + */ > > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) > > +{ > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > + > > + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask)); > > + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != > > + __alignof__(struct bpf_iter_cpumask)); > > + > > + kit->mask = mask; > > + kit->cpu = -1; > > + return 0; > > +} > > We have problem here. Let us say bpf_iter_cpumask_new() is called inside rcu cs. > Once the control goes out of rcu cs, 'mask' could be freed, right? > Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs > as bpf_iter_cpumask_new(). But such a requirement seems odd. > > I think we can do things similar to bpf_iter_task_vma. You can allocate memory > with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask. This > way, you do not need to worry about potential use-after-free issue. > The memory can be freed with bpf_iter_cpumask_destroy(). Good suggestion. That seems better. > > > + > > +/** > > + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask > > + * @it: The bpf_iter_cpumask > > + * > > + * This function retrieves a pointer to the number of the next CPU within the > > + * specified bpf_iter_cpumask. It allows sequential access to CPUs within the > > + * cpumask. If there are no further CPUs available, it returns NULL. > > + * > > + * Returns a pointer to the number of the next CPU in the cpumask or NULL if no > > + * further CPUs. > > + */ > > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) > > +{ > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > + const struct cpumask *mask = kit->mask; > > + int cpu; > > + > > + cpu = cpumask_next(kit->cpu, mask); > > + if (cpu >= nr_cpu_ids) > > + return NULL; > > + > > + kit->cpu = cpu; > > + return &kit->cpu; > > +} > > + > > +/** > > + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask > > + * @it: The bpf_iter_cpumask to be destroyed. > > + */ > > +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) > > +{ > > +} > > + > > __bpf_kfunc_end_defs(); > > > > BTF_SET8_START(cpumask_kfunc_btf_ids) > > @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) > > BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) > > BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) > > BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU_PROTECTED | KF_RCU) > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) > > BTF_SET8_END(cpumask_kfunc_btf_ids) > > > > static const struct btf_kfunc_id_set cpumask_kfunc_set = { -- Regards Yafang