Hi, On 1/19/2024 6:27 AM, Yonghong Song wrote: > > On 1/16/24 6:48 PM, Yafang Shao wrote: >> Add three new kfuncs for bpf_iter_cpumask. >> - bpf_iter_cpumask_new >> It is defined with KF_RCU_PROTECTED and KF_RCU. >> KF_RCU_PROTECTED is defined because we must use it under the >> protection of RCU. >> KF_RCU is defined because the cpumask must be a RCU trusted pointer >> such as task->cpus_ptr. > > I am not sure whether we need both or not. > > KF_RCU_PROTECTED means the function call needs within the rcu cs. > KF_RCU means the argument usage needs within the rcu cs. > We only need one of them (preferrably KF_RCU). > >> - bpf_iter_cpumask_next >> - bpf_iter_cpumask_destroy >> >> These new kfuncs facilitate the iteration of percpu data, such as >> runqueues, psi_cgroup_cpu, and more. >> >> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> >> --- >> kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 69 insertions(+) >> >> diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c >> index 2e73533a3811..1840e48e6142 100644 >> --- a/kernel/bpf/cpumask.c >> +++ b/kernel/bpf/cpumask.c >> @@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct >> cpumask *cpumask) >> return cpumask_weight(cpumask); >> } >> +struct bpf_iter_cpumask { >> + __u64 __opaque[2]; >> +} __aligned(8); >> + >> +struct bpf_iter_cpumask_kern { >> + const struct cpumask *mask; >> + int cpu; >> +} __aligned(8); >> + >> +/** >> + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a >> specified cpumask >> + * @it: The new bpf_iter_cpumask to be created. >> + * @mask: The cpumask to be iterated over. >> + * >> + * This function initializes a new bpf_iter_cpumask structure for >> iterating over >> + * the specified CPU mask. It assigns the provided cpumask to the >> newly created >> + * bpf_iter_cpumask @it for subsequent iteration operations. >> + * >> + * On success, 0 is returen. On failure, ERR is returned. >> + */ >> +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, >> const struct cpumask *mask) >> +{ >> + struct bpf_iter_cpumask_kern *kit = (void *)it; >> + >> + BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > >> sizeof(struct bpf_iter_cpumask)); >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) != >> + __alignof__(struct bpf_iter_cpumask)); >> + >> + kit->mask = mask; >> + kit->cpu = -1; >> + return 0; >> +} > > We have problem here. Let us say bpf_iter_cpumask_new() is called > inside rcu cs. > Once the control goes out of rcu cs, 'mask' could be freed, right? > Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs > as bpf_iter_cpumask_new(). But such a requirement seems odd. So the case is possible when using bpf_iter_cpumask_new() and bpf_iter_cpumask_next() in sleepable program and these two kfuncs are used in two different rcu_read_lock/rcu_read_unlock code blocks, right ? > > I think we can do things similar to bpf_iter_task_vma. You can > allocate memory > with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask. > This > way, you do not need to worry about potential use-after-free issue. > The memory can be freed with bpf_iter_cpumask_destroy(). > >> + >> +/** >> + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask >> + * @it: The bpf_iter_cpumask >> + * >> + * This function retrieves a pointer to the number of the next CPU >> within the >> + * specified bpf_iter_cpumask. It allows sequential access to CPUs >> within the >> + * cpumask. If there are no further CPUs available, it returns NULL. >> + * >> + * Returns a pointer to the number of the next CPU in the cpumask or >> NULL if no >> + * further CPUs. >> + */ >> +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) >> +{ >> + struct bpf_iter_cpumask_kern *kit = (void *)it; >> + const struct cpumask *mask = kit->mask; >> + int cpu; >> + >> + cpu = cpumask_next(kit->cpu, mask); >> + if (cpu >= nr_cpu_ids) >> + return NULL; >> + >> + kit->cpu = cpu; >> + return &kit->cpu; >> +} >> + >> +/** >> + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask >> + * @it: The bpf_iter_cpumask to be destroyed. >> + */ >> +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) >> +{ >> +} >> + >> __bpf_kfunc_end_defs(); >> BTF_SET8_START(cpumask_kfunc_btf_ids) >> @@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) >> BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) >> BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) >> BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU) >> +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | >> KF_RCU_PROTECTED | KF_RCU) >> +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL) >> +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY) >> BTF_SET8_END(cpumask_kfunc_btf_ids) >> static const struct btf_kfunc_id_set cpumask_kfunc_set = { > > .