On Tue, Feb 6, 2024 at 12:14 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > Add three new kfuncs for bpf_iter_cpumask. > - bpf_iter_cpumask_new > KF_RCU is defined because the cpumask must be a RCU trusted pointer > such as task->cpus_ptr. > - bpf_iter_cpumask_next > - bpf_iter_cpumask_destroy > > These new kfuncs facilitate the iteration of percpu data, such as > runqueues, psi_cgroup_cpu, and more. > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > --- > kernel/bpf/cpumask.c | 79 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 79 insertions(+) > > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c > index dad0fb1c8e87..ed6078cfa40e 100644 > --- a/kernel/bpf/cpumask.c > +++ b/kernel/bpf/cpumask.c > @@ -422,6 +422,82 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask) > return cpumask_weight(cpumask); > } > > +struct bpf_iter_cpumask { > + __u64 __opaque[2]; > +} __aligned(8); > + > +struct bpf_iter_cpumask_kern { > + struct cpumask *mask; > + int cpu; > +} __aligned(8); > + > +/** > + * bpf_iter_cpumask_new() - Initialize a new CPU mask iterator for a given CPU mask > + * @it: The new bpf_iter_cpumask to be created. > + * @mask: The cpumask to be iterated over. > + * > + * This function initializes a new bpf_iter_cpumask structure for iterating over > + * the specified CPU mask. It assigns the provided cpumask to the newly created > + * bpf_iter_cpumask @it for subsequent iteration operations. > + * > + * On success, 0 is returned. On failure, ERR is returned. > + */ > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) > +{ > + struct bpf_iter_cpumask_kern *kit = (void *)it; > + > + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask)); > + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != > + __alignof__(struct bpf_iter_cpumask)); > + > + kit->mask = bpf_mem_alloc(&bpf_global_ma, cpumask_size()); > + if (!kit->mask) > + return -ENOMEM; > + > + cpumask_copy(kit->mask, mask); Since it's mem_alloc plus memcpy how about we make it more generic ? Instead of cpumask specific let's pass arbitrary "void *unsafe_addr, u32 size" allocate that much and probe_read_kernel into the buffer? > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) > +{ > + struct bpf_iter_cpumask_kern *kit = (void *)it; > + const struct cpumask *mask = kit->mask; > + int cpu; > + > + if (!mask) > + return NULL; > + cpu = cpumask_next(kit->cpu, mask); Instead of cpumask_next() call find_next_bit() > + if (cpu >= nr_cpu_ids) > + return NULL; instead of nr_cpu_ids we can check size in bits of copied bit array. > BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU) KF_RCU is also not needed. Such iterator will be callable from anywhere and on any address. wdyt?