On Wed, Nov 6, 2024 at 11:35 AM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > This patch series adds open-coded style process file iterator > bpf_iter_task_file and file related kfuncs bpf_fget_task(), > bpf_get_file_ops_type(), and corresponding selftests test cases. > > Known future merge conflict: In linux-next task_lookup_next_fdget_rcu() > has been removed and replaced with fget_task_next() [0], but that has > not happened yet in bpf-next, so I still > use task_lookup_next_fdget_rcu() in bpf_iter_task_file_next(). > > [0]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8fd3395ec9051a52828fcca2328cb50a69dea8ef > > Although iter/task_file already exists, for CRIB we still need the > open-coded iterator style process file iterator, and the same is true > for other bpf iterators such as iter/tcp, iter/udp, etc. > > The traditional bpf iterator is more like a bpf version of procfs, but > similar to procfs, it is not suitable for CRIB scenarios that need to > obtain large amounts of complex, multi-level in-kernel information. > > The following is from previous discussions [1]. > > [1]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > This is because the context of bpf iterators is fixed and bpf iterators > cannot be nested. This means that a bpf iterator program can only > complete a specific small iterative dump task, and cannot dump > multi-level data. > > An example, when we need to dump all the sockets of a process, we need > to iterate over all the files (sockets) of the process, and iterate over > the all packets in the queue of each socket, and iterate over all data > in each packet. > > If we use bpf iterator, since the iterator can not be nested, we need to > use socket iterator program to get all the basic information of all > sockets (pass pid as filter), and then use packet iterator program to > get the basic information of all packets of a specific socket (pass pid, > fd as filter), and then use packet data iterator program to get all the > data of a specific packet (pass pid, fd, packet index as filter). > > This would be complicated and require a lot of (each iteration) > bpf program startup and exit (leading to poor performance). > > By comparison, open coded iterator is much more flexible, we can iterate > in any context, at any time, and iteration can be nested, so we can > achieve more flexible and more elegant dumping through open coded > iterators. > > With open coded iterators, all of the above can be done in a single > bpf program, and with nested iterators, everything becomes compact > and simple. > > Also, bpf iterators transmit data to user space through seq_file, > which involves a lot of open (bpf_iter_create), read, close syscalls, > context switching, memory copying, and cannot achieve the performance > of using ringbuf. > > Discussion > ---------- > > 1. Do we need bpf_iter_task_file_get_fd()? > > Andrii suggested that next() should return a pointer to > a bpf_iter_task_file_item, which contains *file and fd. > > This is feasible, but it might compromise iterator encapsulation? I don't think so, replied on v2 ([0]). I know you saw that, I'm just linking it for others. [0] https://lore.kernel.org/bpf/CAEf4Bzba2N7pxPQh8_BDrVgupZdeow_3S7xSjDmsdhL19eXb3A@xxxxxxxxxxxxxx/ > > More detailed discussion can be found at [3] [4] > > [3]: https://lore.kernel.org/bpf/CAEf4Bzbt0kh53xYZL57Nc9AWcYUKga_NQ6uUrTeU4bj8qyTLng@xxxxxxxxxxxxxx/ > [4]: https://lore.kernel.org/bpf/AM6PR03MB584814D93FE3680635DE61A199562@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > What should we do? Maybe more discussion is needed? > > 2. Where should we put CRIB related kfuncs? > > I totally agree that most of the CRIB related kfuncs are not > CRIB specific. > > The goal of CRIB is to collect all relevant information about a process, > which means we need to add kfuncs involving several different kernel > subsystems (though these kfuncs are not complex and many just help the > bpf program reach a certain data structure). > > But here is a question, where should these CRIB kfuncs be placed? > There doesn't seem to be a suitable file to put them in. > > My current idea is to create a crib folder and then create new files for > the relevant subsystems, e.g. crib/files.c, crib/socket.c, crib/mount.c > etc. Putting them in the same folder makes it easier to maintain > them centrally. > > If anyone else wants to use CRIB kfuncs, welcome to use them. > CRIB is just one of possible applications of such kfuncs, so I'd steer away from over-specifying it as CRIB. task_file open-coded iterator is generic, and should stay close to other task iterator code, as you do in this revision. bpf_get_file_ops_type() is unnecessary, as we already discussed on v2, __ksym and comparison is the way to go here. bpf_fget_task(), if VFS folks agree to add it, probably will have to stay close to other similar VFS helpers. > Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx> > --- > v2 -> v3: > 1. Move task_file open-coded iterator to kernel/bpf/helpers.c. > > 2. Fix duplicate error code 7 in test_bpf_iter_task_file(). > > 3. Add comment for case when bpf_iter_task_file_get_fd() returns -1. > > 4. Add future plans in commit message of "Add struct file related > CRIB kfuncs". > > 5. Add Discussion section to cover letter. > > v1 -> v2: > Fix a type definition error in the fd parameter of > bpf_fget_task() at crib_common.h. > > Juntong Deng (4): > bpf/crib: Introduce task_file open-coded iterator kfuncs > selftests/bpf: Add tests for open-coded style process file iterator > bpf/crib: Add struct file related CRIB kfuncs > selftests/bpf: Add tests for struct file related CRIB kfuncs > > kernel/bpf/Makefile | 1 + > kernel/bpf/crib/Makefile | 3 + > kernel/bpf/crib/crib.c | 28 ++++ > kernel/bpf/crib/files.c | 54 ++++++++ > kernel/bpf/helpers.c | 4 + > kernel/bpf/task_iter.c | 96 +++++++++++++ > tools/testing/selftests/bpf/prog_tests/crib.c | 126 ++++++++++++++++++ > .../testing/selftests/bpf/progs/crib_common.h | 25 ++++ > .../selftests/bpf/progs/crib_files_failure.c | 108 +++++++++++++++ > .../selftests/bpf/progs/crib_files_success.c | 119 +++++++++++++++++ > 10 files changed, 564 insertions(+) > create mode 100644 kernel/bpf/crib/Makefile > create mode 100644 kernel/bpf/crib/crib.c > create mode 100644 kernel/bpf/crib/files.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/crib.c > create mode 100644 tools/testing/selftests/bpf/progs/crib_common.h > create mode 100644 tools/testing/selftests/bpf/progs/crib_files_failure.c > create mode 100644 tools/testing/selftests/bpf/progs/crib_files_success.c > > -- > 2.39.5 >