On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@xxxxxx> wrote: > > Given a loaded dumper bpf program, which already > knows which target it should bind to, there > two ways to create a dumper: > - a file based dumper under hierarchy of > /sys/kernel/bpfdump/ which uses can > "cat" to print out the output. > - an anonymous dumper which user application > can "read" the dumping output. > > For file based dumper, BPF_OBJ_PIN syscall interface > is used. For anonymous dumper, BPF_PROG_ATTACH > syscall interface is used. We discussed this offline with Yonghong a bit, but I thought I'd put my thoughts about this in writing for completeness. To me, it seems like the most consistent way to do both anonymous and named dumpers is through the following steps: 1. BPF_PROG_LOAD to load/verify program, that created program FD. 2. LINK_CREATE using that program FD and direntry FD. This creates dumper bpf_link (bpf_dumper_link), returns anonymous link FD. If link FD is closed, dumper program is detached and dumper is destroyed (unless pinned in bpffs, just like with any other bpf_link. 3. At this point bpf_dumper_link can be treated like a factory of seq_files. We can add a new BPF_DUMPER_OPEN_FILE (all names are for illustration purposes) command, that accepts dumper link FD and returns a new seq_file FD, which can be read() normally (or, e.g., cat'ed from shell). 4. Additionally, this anonymous bpf_link can be pinned/mounted in bpfdumpfs. We can do it as BPF_OBJ_PIN or as a separate command. Once pinned at, e.g., /sys/fs/bpfdump/task/my_dumper, just opening that file is equivalent to BPF_DUMPER_OPEN_FILE and will create a new seq_file that can be read() independently from other seq_files opened against the same dumper. Pinning bpfdumpfs entry also bumps refcnt of bpf_link itself, so even if process that created link dies, bpf dumper stays attached until its bpfdumpfs entry is deleted. Apart from BPF_DUMPER_OPEN_FILE and open()'ing bpfdumpfs file duality, it seems pretty consistent and follows safe-by-default auto-cleanup of anonymous link, unless pinned in bpfdumpfs (or one can still pin bpf_link in bpffs, but it can't be open()'ed the same way, it just preserves BPF program from being cleaned up). Out of all schemes I could come up with, this one seems most unified and nicely fits into bpf_link infra. Thoughts? > > To facilitate target seq_ops->show() to get the > bpf program easily, dumper creation increased > the target-provided seq_file private data size > so bpf program pointer is also stored in seq_file > private data. > > Further, a seq_num which represents how many > bpf_dump_get_prog() has been called is also > available to the target seq_ops->show(). > Such information can be used to e.g., print > banner before printing out actual data. > > Note the seq_num does not represent the num > of unique kernel objects the bpf program has > seen. But it should be a good approximate. > > A target feature BPF_DUMP_SEQ_NET_PRIVATE > is implemented specifically useful for > net based dumpers. It sets net namespace > as the current process net namespace. > This avoids changing existing net seq_ops > in order to retrieve net namespace from > the seq_file pointer. > > For open dumper files, anonymous or not, the > fdinfo will show the target and prog_id associated > with that file descriptor. For dumper file itself, > a kernel interface will be provided to retrieve the > prog_id in one of the later patches. > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- > include/linux/bpf.h | 5 + > include/uapi/linux/bpf.h | 6 +- > kernel/bpf/dump.c | 338 ++++++++++++++++++++++++++++++++- > kernel/bpf/syscall.c | 11 +- > tools/include/uapi/linux/bpf.h | 6 +- > 5 files changed, 362 insertions(+), 4 deletions(-) > [...]