Re: [PATCH bpf-next v2 1/4] bpf/crib: Introduce task_file open-coded iterator kfuncs

Juntong Deng <juntong.deng@xxxxxxxxxxx> · Fri, 1 Nov 2024 20:22:41 +0000

On 2024/11/1 19:06, Andrii Nakryiko wrote:
On Tue, Oct 29, 2024 at 5:15 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:

This patch adds the open-coded iterator style process file iterator
kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all
files opened by the specified process.

In addition, this patch adds bpf_iter_task_file_get_fd() getter to get
the file descriptor corresponding to the file in the current iteration.

The reference to struct file acquired by the previous
bpf_iter_task_file_next() is released in the next
bpf_iter_task_file_next(), and the last reference is released in the
last bpf_iter_task_file_next() that returns NULL.

In the bpf_iter_task_file_destroy(), if the iterator does not iterate to
the end, then the last struct file reference is released at this time.

Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx>
---
  kernel/bpf/Makefile      |   1 +
  kernel/bpf/crib/Makefile |   3 ++
  kernel/bpf/crib/crib.c   |  29 +++++++++++
  kernel/bpf/crib/files.c  | 105 +++++++++++++++++++++++++++++++++++++++
  4 files changed, 138 insertions(+)
  create mode 100644 kernel/bpf/crib/Makefile
  create mode 100644 kernel/bpf/crib/crib.c
  create mode 100644 kernel/bpf/crib/files.c

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 105328f0b9c0..933d36264e5e 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -53,3 +53,4 @@ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
  obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o
  obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
  obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o
+obj-$(CONFIG_BPF_SYSCALL) += crib/
diff --git a/kernel/bpf/crib/Makefile b/kernel/bpf/crib/Makefile
new file mode 100644
index 000000000000..4e1bae1972dd
--- /dev/null
+++ b/kernel/bpf/crib/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_BPF_SYSCALL) += crib.o files.o
diff --git a/kernel/bpf/crib/crib.c b/kernel/bpf/crib/crib.c
new file mode 100644
index 000000000000..e6536ee9a845
--- /dev/null
+++ b/kernel/bpf/crib/crib.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Checkpoint/Restore In eBPF (CRIB)
+ */
+
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+
+BTF_KFUNCS_START(bpf_crib_kfuncs)
+
+BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd)
+BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY)

This is in no way CRIB-specific, right? So I'd drop the CRIB reference
and move code next to task_file BPF iterator program type
implementation, this is a generic functionality.

Even more so, given Namhyung's recent work on adding kmem_cache
iterator (both program type and open-coded iterator), it seems like it
should be possible to cut down on code duplication by using open-coded
iterator logic inside the BPF iterator program. Now that you are
adding task_file open-coded iterator, can you please check if it can
be reused. See kernel/bpf/task_iter.c (and I think that's where your
code should live as well, btw).

pw-bot: cr


Thanks for your reply!

Yes, I agree that it would be better to put the task_file open-coded
iterator together with the traditional task_file iterator (in the
same file).

I will move it in the next patch series.

+
+BTF_KFUNCS_END(bpf_crib_kfuncs)
+
+static const struct btf_kfunc_id_set bpf_crib_kfunc_set = {
+       .owner = THIS_MODULE,
+       .set   = &bpf_crib_kfuncs,
+};
+
+static int __init bpf_crib_init(void)
+{
+       return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_crib_kfunc_set);
+}
+
+late_initcall(bpf_crib_init);
diff --git a/kernel/bpf/crib/files.c b/kernel/bpf/crib/files.c
new file mode 100644
index 000000000000..ececf150303f
--- /dev/null
+++ b/kernel/bpf/crib/files.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/btf.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/net.h>
+
+struct bpf_iter_task_file {
+       __u64 __opaque[3];
+} __aligned(8);
+
+struct bpf_iter_task_file_kern {
+       struct task_struct *task;
+       struct file *file;
+       int fd;
+} __aligned(8);
+
+__bpf_kfunc_start_defs();
+
+/**
+ * bpf_iter_task_file_new() - Initialize a new task file iterator for a task,
+ * used to iterate over all files opened by a specified task
+ *
+ * @it: the new bpf_iter_task_file to be created
+ * @task: a pointer pointing to a task to be iterated over
+ */
+__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it,
+               struct task_struct *task)
+{
+       struct bpf_iter_task_file_kern *kit = (void *)it;
+
+       BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file));
+       BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) !=
+                    __alignof__(struct bpf_iter_task_file));
+
+       kit->task = task;
+       kit->fd = -1;
+       kit->file = NULL;
+
+       return 0;
+}
+
+/**
+ * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file
+ *
+ * bpf_iter_task_file_next acquires a reference to the returned struct file.
+ *
+ * The reference to struct file acquired by the previous
+ * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(),
+ * and the last reference is released in the last bpf_iter_task_file_next()
+ * that returns NULL.
+ *
+ * @it: the bpf_iter_task_file to be checked
+ *
+ * @returns a pointer to the struct file of the next file if further files
+ * are available, otherwise returns NULL
+ */
+__bpf_kfunc struct file *bpf_iter_task_file_next(struct bpf_iter_task_file *it)
+{
+       struct bpf_iter_task_file_kern *kit = (void *)it;
+
+       if (kit->file)
+               fput(kit->file);
+
+       kit->fd++;
+
+       rcu_read_lock();
+       kit->file = task_lookup_next_fdget_rcu(kit->task, &kit->fd);
+       rcu_read_unlock();
+
+       return kit->file;
+}
+
+/**
+ * bpf_iter_task_file_get_fd() - Get the file descriptor corresponding to
+ * the file in the current iteration
+ *
+ * @it: the bpf_iter_task_file to be checked
+ *
+ * @returns the file descriptor
+ */
+__bpf_kfunc int bpf_iter_task_file_get_fd(struct bpf_iter_task_file *it__iter)
+{
+       struct bpf_iter_task_file_kern *kit = (void *)it__iter;
+
+       return kit->fd;
+}
+

I don't think we need this. It's probably better to return a pointer
to a small struct representing "item" returned from this iterator.
Something like

struct bpf_iter_task_file_item {
     struct task_struct *task;
     struct file *file;
     int fd;
};

You can then embed this struct into struct bpf_iter_task_file and
return a pointer to it on each next() call (avoiding memory
allocation)


(naming just for illustrative purposes, I spent 0 seconds thinking about it)


Yes, I agree that it is feasible.

But there is a question here, should we expose the internal state
structure of the iterator (If we want to embed) ?

I guess that we need two versions of data structures struct bpf_iter_xxx
and struct bpf_iter_xxx_kern is for the purpose of encapsulation?

With two versions of the data structure, users can only manipulate
the iterator using the iterator kfuncs, avoiding users from directly
accessing the internal state.

After we decide to return struct bpf_iter_task_file_item, these members
will not be able to change and users can directly access/change the
internal state of the iterator.

+/**
+ * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file
+ *
+ * If the iterator does not iterate to the end, then the last
+ * struct file reference is released at this time.
+ *
+ * @it: the bpf_iter_task_file to be destroyed
+ */
+__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it)
+{
+       struct bpf_iter_task_file_kern *kit = (void *)it;
+
+       if (kit->file)
+               fput(kit->file);
+}
+
+__bpf_kfunc_end_defs();
--
2.39.5