On Wed, Jan 24, 2024 at 1:31 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > On Wed, Jan 24, 2024 at 2:26 AM David Vernet <void@xxxxxxxxxxxxx> wrote: > > > > On Tue, Jan 23, 2024 at 11:27:14PM +0800, Yafang Shao wrote: > > > Add three new kfuncs for bpf_iter_cpumask. > > > - bpf_iter_cpumask_new > > > KF_RCU is defined because the cpumask must be a RCU trusted pointer > > > such as task->cpus_ptr. > > > - bpf_iter_cpumask_next > > > - bpf_iter_cpumask_destroy > > > > > > These new kfuncs facilitate the iteration of percpu data, such as > > > runqueues, psi_cgroup_cpu, and more. > > > > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > > > Thanks for working on this, this will be nice to have! > > > > > --- > > > kernel/bpf/cpumask.c | 82 ++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 82 insertions(+) > > > > > > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c > > > index 2e73533a3811..474072a235d6 100644 > > > --- a/kernel/bpf/cpumask.c > > > +++ b/kernel/bpf/cpumask.c > > > @@ -422,6 +422,85 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask) > > > return cpumask_weight(cpumask); > > > } > > > > > > +struct bpf_iter_cpumask { > > > + __u64 __opaque[2]; > > > +} __aligned(8); > > > + > > > +struct bpf_iter_cpumask_kern { > > > + struct cpumask *mask; > > > + int cpu; > > > +} __aligned(8); > > > > Why do we need both of these if we're not going to put the opaque > > iterator in UAPI? > > Good point! Will remove it. > It aligns with the pattern seen in > bpf_iter_{css,task,task_vma,task_css}_kern, suggesting that we should > indeed eliminate them. > It feels a bit cleaner to have API-oriented (despite being unstable and coming from vmlinux.h) iter struct like bpf_iter_cpumask with just "__opaque" field. And then having _kern variant with actual memory layout. Technically _kern struct could grow smaller. I certainly wanted this split for bpf_iter_num as that one is more of a general purpose and stable struct. It's less relevant for more unstable iters here. > > > > > + > > > +/** > > > + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a specified cpumask > > > + * @it: The new bpf_iter_cpumask to be created. > > > + * @mask: The cpumask to be iterated over. > > > + * > > > + * This function initializes a new bpf_iter_cpumask structure for iterating over > > > + * the specified CPU mask. It assigns the provided cpumask to the newly created > > > + * bpf_iter_cpumask @it for subsequent iteration operations. > > > + * > > > + * On success, 0 is returen. On failure, ERR is returned. > > > + */ > > > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) > > > +{ > > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > > + > > > + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask)); > > > + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != > > > + __alignof__(struct bpf_iter_cpumask)); > > > > Why are we checking > in the first expression instead of just plain > > equality? > > Similar to the previous case, it aligns with others. Once we eliminate > the struct bpf_iter_cpumask_kern, we can safely discard these > BUILD_BUG_ON() statements as well. > > > > > > + > > > + kit->mask = bpf_mem_alloc(&bpf_global_ma, sizeof(struct cpumask)); > > > > Probably better to use cpumask_size() here. > > will use it. > > > > > > + if (!kit->mask) > > > + return -ENOMEM; > > > + > > > + cpumask_copy(kit->mask, mask); > > > + kit->cpu = -1; > > > + return 0; > > > +} > > > + > > > +/** > > > + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask > > > + * @it: The bpf_iter_cpumask > > > + * > > > + * This function retrieves a pointer to the number of the next CPU within the > > > + * specified bpf_iter_cpumask. It allows sequential access to CPUs within the > > > + * cpumask. If there are no further CPUs available, it returns NULL. > > > + * > > > + * Returns a pointer to the number of the next CPU in the cpumask or NULL if no > > > + * further CPUs. > > > + */ > > > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) > > > +{ > > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > > + const struct cpumask *mask = kit->mask; > > > + int cpu; > > > + > > > + if (!mask) > > > + return NULL; > > > + cpu = cpumask_next(kit->cpu, mask); > > > + if (cpu >= nr_cpu_ids) > > > + return NULL; > > > + > > > + kit->cpu = cpu; > > > + return &kit->cpu; > > > +} > > > + > > > +/** > > > + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask > > > + * @it: The bpf_iter_cpumask to be destroyed. > > > + * > > > + * Destroy the resource assiciated with the bpf_iter_cpumask. > > > + */ > > > +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) > > > +{ > > > + struct bpf_iter_cpumask_kern *kit = (void *)it; > > > + > > > + if (!kit->mask) > > > + return; > > > + bpf_mem_free(&bpf_global_ma, kit->mask); > > > +} > > > + > > > __bpf_kfunc_end_defs(); > > > > > > BTF_SET8_START(cpumask_kfunc_btf_ids) > > > @@ -450,6 +529,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) > > > BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) > > > BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) > > > BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) > > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU) > > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) > > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) > > > BTF_SET8_END(cpumask_kfunc_btf_ids) > > > > > > static const struct btf_kfunc_id_set cpumask_kfunc_set = { > > > -- > > > 2.39.1 > > > > > > > > -- > Regards > Yafang