On Fri, Nov 1, 2024 at 1:22 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > On 2024/11/1 19:06, Andrii Nakryiko wrote: > > On Tue, Oct 29, 2024 at 5:15 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > >> > >> This patch adds the open-coded iterator style process file iterator > >> kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all > >> files opened by the specified process. > >> > >> In addition, this patch adds bpf_iter_task_file_get_fd() getter to get > >> the file descriptor corresponding to the file in the current iteration. > >> > >> The reference to struct file acquired by the previous > >> bpf_iter_task_file_next() is released in the next > >> bpf_iter_task_file_next(), and the last reference is released in the > >> last bpf_iter_task_file_next() that returns NULL. > >> > >> In the bpf_iter_task_file_destroy(), if the iterator does not iterate to > >> the end, then the last struct file reference is released at this time. > >> > >> Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx> > >> --- > >> kernel/bpf/Makefile | 1 + > >> kernel/bpf/crib/Makefile | 3 ++ > >> kernel/bpf/crib/crib.c | 29 +++++++++++ > >> kernel/bpf/crib/files.c | 105 +++++++++++++++++++++++++++++++++++++++ > >> 4 files changed, 138 insertions(+) > >> create mode 100644 kernel/bpf/crib/Makefile > >> create mode 100644 kernel/bpf/crib/crib.c > >> create mode 100644 kernel/bpf/crib/files.c > >> > >> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile > >> index 105328f0b9c0..933d36264e5e 100644 > >> --- a/kernel/bpf/Makefile > >> +++ b/kernel/bpf/Makefile > >> @@ -53,3 +53,4 @@ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o > >> obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o > >> obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o > >> obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o > >> +obj-$(CONFIG_BPF_SYSCALL) += crib/ > >> diff --git a/kernel/bpf/crib/Makefile b/kernel/bpf/crib/Makefile > >> new file mode 100644 > >> index 000000000000..4e1bae1972dd > >> --- /dev/null > >> +++ b/kernel/bpf/crib/Makefile > >> @@ -0,0 +1,3 @@ > >> +# SPDX-License-Identifier: GPL-2.0 > >> + > >> +obj-$(CONFIG_BPF_SYSCALL) += crib.o files.o > >> diff --git a/kernel/bpf/crib/crib.c b/kernel/bpf/crib/crib.c > >> new file mode 100644 > >> index 000000000000..e6536ee9a845 > >> --- /dev/null > >> +++ b/kernel/bpf/crib/crib.c > >> @@ -0,0 +1,29 @@ > >> +// SPDX-License-Identifier: GPL-2.0 > >> +/* > >> + * Checkpoint/Restore In eBPF (CRIB) > >> + */ > >> + > >> +#include <linux/bpf.h> > >> +#include <linux/btf.h> > >> +#include <linux/btf_ids.h> > >> + > >> +BTF_KFUNCS_START(bpf_crib_kfuncs) > >> + > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY) > > > > This is in no way CRIB-specific, right? So I'd drop the CRIB reference > > and move code next to task_file BPF iterator program type > > implementation, this is a generic functionality. > > > > Even more so, given Namhyung's recent work on adding kmem_cache > > iterator (both program type and open-coded iterator), it seems like it > > should be possible to cut down on code duplication by using open-coded > > iterator logic inside the BPF iterator program. Now that you are > > adding task_file open-coded iterator, can you please check if it can > > be reused. See kernel/bpf/task_iter.c (and I think that's where your > > code should live as well, btw). > > > > pw-bot: cr > > > > Thanks for your reply! > > Yes, I agree that it would be better to put the task_file open-coded > iterator together with the traditional task_file iterator (in the > same file). > > I will move it in the next patch series. > > >> + > >> +BTF_KFUNCS_END(bpf_crib_kfuncs) > >> + > >> +static const struct btf_kfunc_id_set bpf_crib_kfunc_set = { > >> + .owner = THIS_MODULE, > >> + .set = &bpf_crib_kfuncs, > >> +}; > >> + > >> +static int __init bpf_crib_init(void) > >> +{ > >> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_crib_kfunc_set); > >> +} > >> + > >> +late_initcall(bpf_crib_init); > >> diff --git a/kernel/bpf/crib/files.c b/kernel/bpf/crib/files.c > >> new file mode 100644 > >> index 000000000000..ececf150303f > >> --- /dev/null > >> +++ b/kernel/bpf/crib/files.c > >> @@ -0,0 +1,105 @@ > >> +// SPDX-License-Identifier: GPL-2.0 > >> + > >> +#include <linux/btf.h> > >> +#include <linux/file.h> > >> +#include <linux/fdtable.h> > >> +#include <linux/net.h> > >> + > >> +struct bpf_iter_task_file { > >> + __u64 __opaque[3]; > >> +} __aligned(8); > >> + > >> +struct bpf_iter_task_file_kern { > >> + struct task_struct *task; > >> + struct file *file; > >> + int fd; > >> +} __aligned(8); > >> + > >> +__bpf_kfunc_start_defs(); > >> + > >> +/** > >> + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task, > >> + * used to iterate over all files opened by a specified task > >> + * > >> + * @it: the new bpf_iter_task_file to be created > >> + * @task: a pointer pointing to a task to be iterated over > >> + */ > >> +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, > >> + struct task_struct *task) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file)); > >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) != > >> + __alignof__(struct bpf_iter_task_file)); > >> + > >> + kit->task = task; > >> + kit->fd = -1; > >> + kit->file = NULL; > >> + > >> + return 0; > >> +} > >> + > >> +/** > >> + * bpf_iter_task_file_next() - Get the next file in bpf_iter_task_file > >> + * > >> + * bpf_iter_task_file_next acquires a reference to the returned struct file. > >> + * > >> + * The reference to struct file acquired by the previous > >> + * bpf_iter_task_file_next() is released in the next bpf_iter_task_file_next(), > >> + * and the last reference is released in the last bpf_iter_task_file_next() > >> + * that returns NULL. > >> + * > >> + * @it: the bpf_iter_task_file to be checked > >> + * > >> + * @returns a pointer to the struct file of the next file if further files > >> + * are available, otherwise returns NULL > >> + */ > >> +__bpf_kfunc struct file *bpf_iter_task_file_next(struct bpf_iter_task_file *it) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + if (kit->file) > >> + fput(kit->file); > >> + > >> + kit->fd++; > >> + > >> + rcu_read_lock(); > >> + kit->file = task_lookup_next_fdget_rcu(kit->task, &kit->fd); > >> + rcu_read_unlock(); > >> + > >> + return kit->file; > >> +} > >> + > >> +/** > >> + * bpf_iter_task_file_get_fd() - Get the file descriptor corresponding to > >> + * the file in the current iteration > >> + * > >> + * @it: the bpf_iter_task_file to be checked > >> + * > >> + * @returns the file descriptor > >> + */ > >> +__bpf_kfunc int bpf_iter_task_file_get_fd(struct bpf_iter_task_file *it__iter) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it__iter; > >> + > >> + return kit->fd; > >> +} > >> + > > > > I don't think we need this. It's probably better to return a pointer > > to a small struct representing "item" returned from this iterator. > > Something like > > > > struct bpf_iter_task_file_item { > > struct task_struct *task; > > struct file *file; > > int fd; > > }; > > > > You can then embed this struct into struct bpf_iter_task_file and > > return a pointer to it on each next() call (avoiding memory > > allocation) > > > > > > (naming just for illustrative purposes, I spent 0 seconds thinking about it) > > > > Yes, I agree that it is feasible. > > But there is a question here, should we expose the internal state > structure of the iterator (If we want to embed) ? > > I guess that we need two versions of data structures struct bpf_iter_xxx > and struct bpf_iter_xxx_kern is for the purpose of encapsulation? Yes, that's what we do for iterator state structure, and you should do that as well. bpf_iter_xxx one will be opaque (see other examples, we literally add `u64 __opaque[N];` there). But this bpf_iter_task_file_item will be sort of internal API that is returned from bpf_iter_task_file_next(). So you'll have something like struct bpf_iter_task_file { .... additional state ... struct bpf_iter_task_file_item item; }; then you have struct bpf_iter_task_file_item bpf_iter_task_file_next(struct bpf_iter_task_file *it) { struct bpf_iter_task_file_kern *kit = (void *)it; ... kit->item.task = <sometask>; kit->item.file = <file>; /* and so on */ return &kit->item; } > > With two versions of the data structure, users can only manipulate > the iterator using the iterator kfuncs, avoiding users from directly > accessing the internal state. > > After we decide to return struct bpf_iter_task_file_item, these members > will not be able to change and users can directly access/change the > internal state of the iterator. Yes, you have to carefully set up bpf_iter_task_file_item, but you could expand it in the future without breaking any old users, because you only return it by pointer and with BPF CO-RE all the field shifts will be correctly handled. > > >> +/** > >> + * bpf_iter_task_file_destroy() - Destroy a bpf_iter_task_file > >> + * > >> + * If the iterator does not iterate to the end, then the last > >> + * struct file reference is released at this time. > >> + * > >> + * @it: the bpf_iter_task_file to be destroyed > >> + */ > >> +__bpf_kfunc void bpf_iter_task_file_destroy(struct bpf_iter_task_file *it) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + if (kit->file) > >> + fput(kit->file); > >> +} > >> + > >> +__bpf_kfunc_end_defs(); > >> -- > >> 2.39.5 > >> >