On Fri, Mar 15, 2024 at 4:00 PM 梦龙董 <dongmenglong.8@xxxxxxxxxxxxx> wrote: > > On Thu, Mar 14, 2024 at 8:27 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Tue, Mar 12, 2024 at 6:53 PM 梦龙董 <dongmenglong.8@xxxxxxxxxxxxx> wrote: > [......] > > > What does "a hundred attachments max" means? Can't I > > > trace thousands of kernel functions with a bpf program of > > > tracing multi-link? > > > > I mean what time does it take to attach one program > > to 100 fentry-s ? > > What is the time for 1k and for 10k ? > > > > The kprobe multi test attaches to pretty much all funcs in > > /sys/kernel/tracing/available_filter_functions > > and it's fast enough to run in test_progs on every commit in bpf CI. > > See get_syms() in prog_tests/kprobe_multi_test.c > > > > Can this new multi fentry do that? > > and at what speed? > > The answer will decide how applicable this api is going to be. > > Generating different trampolines for every attach point > > is an approach as well. Pls benchmark it too. > > I see. Creating plenty of trampolines does take a lot of time, > and I'll do testing on it. > I have done a simple benchmark on creating 1000 trampolines. It is slow, quite slow, which consume up to 60s. We can't do it this way. Now, I have a bad idea. How about we introduce a "dynamic trampoline"? The basic logic of it can be: """ save regs bpfs = trampoline_lookup_ip(ip) fentry = bpfs->fentries while fentry: fentry(ctx) fentry = fentry->next call origin save return value fexit = bpfs->fexits while fexit: fexit(ctx) fexit = fexit->next xxxxxx """ And we lookup the "bpfs" by the function ip in a hash map in trampoline_lookup_ip. The type of "bpfs" is: struct bpf_array { struct bpf_prog *fentries; struct bpf_prog *fexits; struct bpf_prog *modify_returns; } When we need to attach the bpf progA to function A/B/C, we only need to create the bpf_arrayA, bpf_arrayB, bpf_arrayC and add the progA to them, and insert them to the hash map "direct_call_bpfs", and attach the "dynamic trampoline" to A/B/C. If bpf_arrayA exist, just add progA to the tail of bpf_arrayA->fentries. When we need to attach progB to B/C, just add progB to bpf_arrayB->fentries and bpf_arrayB->fentries. Compared to the trampoline, extra overhead is introduced by the hash lookuping. I have not begun to code yet, and I am not sure the overhead is acceptable. Considering that we also need to do hash lookup by the function in kprobe_multi, maybe the overhead is acceptable? Thanks! Menglong Dong > > > > > > > > > > Let's step back. > [......] > > > > For one trampoline to handle all attach points we might > > need some arch support, but we can start simple. > > Make btf_func_model with MAX_BPF_FUNC_REG_ARGS > > by calling btf_distill_func_proto() with func==NULL. > > And use that to build a trampoline. > > > > The challenge is how to use minimal number of trampolines > > when bpf_progA is attached for func1, func2, func3 > > and bpf_progB is attached to func3, func4, func5. > > We'd still need 3 trampolines: > > for func[12] to call bpf_progA, > > for func3 to call bpf_progA and bpf_progB, > > for func[45] to call bpf_progB. > > > > Jiri was trying to solve it in the past. His slides from LPC: > > https://lpc.events/event/16/contributions/1350/attachments/1033/1983/plumbers.pdf > > > > Pls study them and his prior patchsets to avoid stepping on the same rakes.