On Fri, Mar 29, 2024 at 7:17 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Mar 28, 2024 at 8:10 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > On Thu, 28 Mar 2024 22:43:46 +0800 > > 梦龙董 <dongmenglong.8@xxxxxxxxxxxxx> wrote: > > > > > I have done a simple benchmark on creating 1000 > > > trampolines. It is slow, quite slow, which consume up to > > > 60s. We can't do it this way. > > > > > > Now, I have a bad idea. How about we introduce > > > a "dynamic trampoline"? The basic logic of it can be: > > > > > > """ > > > save regs > > > bpfs = trampoline_lookup_ip(ip) > > > fentry = bpfs->fentries > > > while fentry: > > > fentry(ctx) > > > fentry = fentry->next > > > > > > call origin > > > save return value > > > > > > fexit = bpfs->fexits > > > while fexit: > > > fexit(ctx) > > > fexit = fexit->next > > > > > > xxxxxx > > > """ > > > > > > And we lookup the "bpfs" by the function ip in a hash map > > > in trampoline_lookup_ip. The type of "bpfs" is: > > > > > > struct bpf_array { > > > struct bpf_prog *fentries; > > > struct bpf_prog *fexits; > > > struct bpf_prog *modify_returns; > > > } > > > > > > When we need to attach the bpf progA to function A/B/C, > > > we only need to create the bpf_arrayA, bpf_arrayB, bpf_arrayC > > > and add the progA to them, and insert them to the hash map > > > "direct_call_bpfs", and attach the "dynamic trampoline" to > > > A/B/C. If bpf_arrayA exist, just add progA to the tail of > > > bpf_arrayA->fentries. When we need to attach progB to > > > B/C, just add progB to bpf_arrayB->fentries and > > > bpf_arrayB->fentries. > > > > > > Compared to the trampoline, extra overhead is introduced > > > by the hash lookuping. > > > > > > I have not begun to code yet, and I am not sure the overhead is > > > acceptable. Considering that we also need to do hash lookup > > > by the function in kprobe_multi, maybe the overhead is > > > acceptable? > > > > Sounds like you are just recreating the function management that ftrace > > has. It also can add thousands of trampolines very quickly, because it does > > it in batches. It takes special synchronization steps to attach to fentry. > > ftrace (and I believe multi-kprobes) updates all the attachments for each > > step, so the synchronization needed is only done once. > > > > If you really want to have thousands of functions, why not just register it > > with ftrace itself. It will give you the arguments via the ftrace_regs > > structure. Can't you just register a program as the callback? > > > > It will probably make your accounting much easier, and just let ftrace > > handle the fentry logic. That's what it was made to do. > > Absolutely agree. > There is no point re-inventing this logic. > > Menlong, > before you hook up into ftrace check whether > it's going to be any different from kprobe-multi, > since it's the same ftrace underneath. > I suspect it will look exactly the same. Yeah, I dig it a little. I think it is different. For multi-kprobe, it registers a ftrace_ops to ftrace_ops_list for every bpf program. This means that we can register 2 or more multi-kprobe in the same function. The bpf is called in the following step: ftrace_regs_caller | __ftrace_ops_list_func -> fprobe_handler -> kprobe_multi_link_handler -> run BPF And for trampoline, it needs to be called directly, so it can't be registered as a callback to ftrace_ops_list. It need to be called in the following step: ftrace_regs_caller | __ftrace_ops_list_func -> call_direct_funcs -> save trampoline to pt_regs->origin_ax | call pt_regs->origin_ax if not NULL > So it sounds like multi-fentry idea will be shelved once again. Enn...this is the best solution that I can think of. If it doesn't work, I suspect it will be shelved again. Thanks! Menglong Dong