On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > Adding BPF_TRAMPOLINE_BATCH_ATTACH support, that allows to attach > tracing multiple fentry/fexit pograms to trampolines within one > syscall. > > Currently each tracing program is attached in seprate bpf syscall > and more importantly by separate register_ftrace_direct call, which > registers trampoline in ftrace subsystem. We can save some cycles > by simple using its batch variant register_ftrace_direct_ips. > > Before: > > Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* > { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): > > 2,199,433,771 cycles:k ( +- 0.55% ) > 936,105,469 cycles:u ( +- 0.37% ) > > 26.48 +- 3.57 seconds time elapsed ( +- 13.49% ) > > After: > > Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* > { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): > > 1,456,854,867 cycles:k ( +- 0.57% ) > 937,737,431 cycles:u ( +- 0.13% ) > > 12.44 +- 2.98 seconds time elapsed ( +- 23.95% ) > > The new BPF_TRAMPOLINE_BATCH_ATTACH syscall command expects > following data in union bpf_attr: > > struct { > __aligned_u64 in; > __aligned_u64 out; > __u32 count; > } trampoline_batch; > > in - pointer to user space array with file descrptors of loaded bpf > programs to attach > out - pointer to user space array for resulting link descriptor > count - number of 'in/out' file descriptors > > Basically the new code gets programs from 'in' file descriptors and > attaches them the same way the current code does, apart from the last > step that registers probe ip with trampoline. This is done at the end > with new register_ftrace_direct_ips function. > > The resulting link descriptors are written in 'out' array and match > 'in' array file descriptors order. > I think this is a pretty hard API to use correctly from user-space. Think about all those partially attached and/or partially detached BPF programs. And subsequent clean up for them. Also there is nothing even close to atomicity, so you might get a spurious invocation a few times before batch-attach fails mid-way and the kernel (hopefully) will detach those already attached programs in an attempt to clean everything up. Debugging and handling that is a big pain for users, IMO. Here's a raw idea, let's think if it would be possible to implement something like this. It seems like what you need is to create a set of logically-grouped placeholders for multiple functions you are about to attach to. Until the BPF program is attached, those placeholders are just no-ops (e.g., they might jump to an "inactive" single trampoline, which just immediately returns). Then you attach the BPF program atomically into a single place, and all those no-op jumps to a trampoline start to call the BPF program at the same time. It's not strictly atomic, but is much closer in time with each other. Also, because it's still a single trampoline, you get a nice mapping to a single bpf_link, so detaching is not an issue. Basically, maybe ftrace subsystem could provide a set of APIs to prepare a set of functions to attach to. Then BPF subsystem would just do what it does today, except instead of attaching to a specific kernel function, it would attach to ftrace's placeholder. I don't know anything about ftrace implementation, so this might be far off. But I thought that looking at this problem from a bit of a different angle would benefit the discussion. Thoughts? > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > --- > include/linux/bpf.h | 15 ++++++- > include/uapi/linux/bpf.h | 7 ++++ > kernel/bpf/syscall.c | 88 ++++++++++++++++++++++++++++++++++++++-- > kernel/bpf/trampoline.c | 69 +++++++++++++++++++++++++++---- > 4 files changed, 164 insertions(+), 15 deletions(-) > [...]