On 1/18/24 4:51 PM, Hou Tao wrote:
Hi,
On 1/19/2024 6:27 AM, Yonghong Song wrote:
On 1/16/24 6:48 PM, Yafang Shao wrote:
Add three new kfuncs for bpf_iter_cpumask.
- bpf_iter_cpumask_new
It is defined with KF_RCU_PROTECTED and KF_RCU.
KF_RCU_PROTECTED is defined because we must use it under the
protection of RCU.
KF_RCU is defined because the cpumask must be a RCU trusted pointer
such as task->cpus_ptr.
I am not sure whether we need both or not.
KF_RCU_PROTECTED means the function call needs within the rcu cs.
KF_RCU means the argument usage needs within the rcu cs.
We only need one of them (preferrably KF_RCU).
- bpf_iter_cpumask_next
- bpf_iter_cpumask_destroy
These new kfuncs facilitate the iteration of percpu data, such as
runqueues, psi_cgroup_cpu, and more.
Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
---
kernel/bpf/cpumask.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
index 2e73533a3811..1840e48e6142 100644
--- a/kernel/bpf/cpumask.c
+++ b/kernel/bpf/cpumask.c
@@ -422,6 +422,72 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct
cpumask *cpumask)
return cpumask_weight(cpumask);
}
+struct bpf_iter_cpumask {
+ __u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_cpumask_kern {
+ const struct cpumask *mask;
+ int cpu;
+} __aligned(8);
+
+/**
+ * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a
specified cpumask
+ * @it: The new bpf_iter_cpumask to be created.
+ * @mask: The cpumask to be iterated over.
+ *
+ * This function initializes a new bpf_iter_cpumask structure for
iterating over
+ * the specified CPU mask. It assigns the provided cpumask to the
newly created
+ * bpf_iter_cpumask @it for subsequent iteration operations.
+ *
+ * On success, 0 is returen. On failure, ERR is returned.
+ */
+__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it,
const struct cpumask *mask)
+{
+ struct bpf_iter_cpumask_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) >
sizeof(struct bpf_iter_cpumask));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) !=
+ __alignof__(struct bpf_iter_cpumask));
+
+ kit->mask = mask;
+ kit->cpu = -1;
+ return 0;
+}
We have problem here. Let us say bpf_iter_cpumask_new() is called
inside rcu cs.
Once the control goes out of rcu cs, 'mask' could be freed, right?
Or you require bpf_iter_cpumask_next() needs to be in the same rcu cs
as bpf_iter_cpumask_new(). But such a requirement seems odd.
So the case is possible when using bpf_iter_cpumask_new() and
bpf_iter_cpumask_next() in sleepable program and these two kfuncs are
used in two different rcu_read_lock/rcu_read_unlock code blocks, right ?
Right, or bpf_iter_cpumask_new() inside rcu cs and bpf_iter_cpumask_next() not.
I think we can do things similar to bpf_iter_task_vma. You can
allocate memory
with bpf_mem_alloc() in bpf_iter_cpumask_new() to keep a copy of mask.
This
way, you do not need to worry about potential use-after-free issue.
The memory can be freed with bpf_iter_cpumask_destroy().
+
+/**
+ * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask
+ * @it: The bpf_iter_cpumask
+ *
+ * This function retrieves a pointer to the number of the next CPU
within the
+ * specified bpf_iter_cpumask. It allows sequential access to CPUs
within the
+ * cpumask. If there are no further CPUs available, it returns NULL.
+ *
+ * Returns a pointer to the number of the next CPU in the cpumask or
NULL if no
+ * further CPUs.
+ */
+__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
+{
+ struct bpf_iter_cpumask_kern *kit = (void *)it;
+ const struct cpumask *mask = kit->mask;
+ int cpu;
+
+ cpu = cpumask_next(kit->cpu, mask);
+ if (cpu >= nr_cpu_ids)
+ return NULL;
+
+ kit->cpu = cpu;
+ return &kit->cpu;
+}
+
+/**
+ * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask
+ * @it: The bpf_iter_cpumask to be destroyed.
+ */
+__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
+{
+}
+
__bpf_kfunc_end_defs();
BTF_SET8_START(cpumask_kfunc_btf_ids)
@@ -450,6 +516,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW |
KF_RCU_PROTECTED | KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
BTF_SET8_END(cpumask_kfunc_btf_ids)
static const struct btf_kfunc_id_set cpumask_kfunc_set = {
.