On Thu, Aug 25, 2022 at 9:08 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Tue, Aug 23, 2022 at 06:22:37PM -0700, Alexei Starovoitov wrote: > > On Mon, Aug 08, 2022 at 04:06:19PM +0200, Jiri Olsa wrote: > > > Adding support to attach program to multiple trampolines > > > with new attach/detach interface: > > > > > > int bpf_trampoline_multi_attach(struct bpf_tramp_prog *tp, > > > struct bpf_tramp_id *id) > > > int bpf_trampoline_multi_detach(struct bpf_tramp_prog *tp, > > > struct bpf_tramp_id *id) > > > > > > The program is passed as bpf_tramp_prog object and trampolines to > > > attach it to are passed as bpf_tramp_id object. > > > > > > The interface creates new bpf_trampoline object which is initialized > > > as 'multi' trampoline and stored separtely from standard trampolines. > > > > > > There are following rules how the standard and multi trampolines > > > go along: > > > - multi trampoline can attach on top of existing single trampolines, > > > which creates 2 types of function IDs: > > > > > > 1) single-IDs - functions that are attached within existing single > > > trampolines > > > 2) multi-IDs - functions that were 'free' and are now taken by new > > > 'multi' trampoline > > > > > > - we allow overlapping of 2 'multi' trampolines if they are attached > > > to same IDs > > > - we do now allow any other overlapping of 2 'multi' trampolines > > > - any new 'single' trampoline cannot attach to existing multi-IDs IDs. > > > > > > Maybe better explained on following example: > > > > > > - you want to attach program P to functions A,B,C,D,E,F > > > via bpf_trampoline_multi_attach > > > > > > - D,E,F already have standard trampoline attached > > > > > > - the bpf_trampoline_multi_attach will create new 'multi' trampoline > > > which spans over A,B,C functions and attach program P to single > > > trampolines D,E,F > > > > > > - A,B,C functions are now 'not attachable' by any trampoline > > > until the above 'multi' trampoline is released > > > > This restriction is probably too severe. > > Song added support for trampoline and KLPs to co-exist on the same function. > > This multi trampoline restriction will resurface the same issue. > > afiak this restriction is only because multi trampoline image > > is the same for A,B,C. This memory optimization is probably going too far. > > How about we keep existing logic of one tramp image per function. > > Pretend that multi-prog P matches BTF of the target function, > > create normal tramp for it and attach prog P there. > > The prototype of P allows six u64. The args are potentially rearding > > garbage, but there are no safety issues, since multi progs don't know BTF types. > > > > We still need sinle bpf_link_multi to contain btf_ids of all functions, > > but it can point to many bpf tramps. One for each attach function. > > > > iirc we discussed something like this long ago, but I don't remember > > why we didn't go that route. > > arch_prepare_bpf_trampoline is fast. > > bpf_tramp_image_alloc is fast too. > > So attaching one multi-prog to thousands of btf_id-s should be fast too. > > The destroy part is interesting. > > There we will be doing thousands of bpf_tramp_image_put, > > but it's all async now. We used to have synchronize_rcu() which could > > be the reason why this approach was slow. > > Or is this unregister_fentry that slows it down? > > But register_ftrace_direct_multi() interface should have solved it > > for both register and unregister? > > I think it's the synchronize_rcu_tasks at the end of each ftrace update, > that's why we added un/register_ftrace_direct_multi that makes the changes > for multiple ips and syncs once at the end hmm. Can synchronize_rcu_tasks be made optional? For ftrace_direct that points to bpf tramps is it really needed? > un/register_ftrace_direct_multi will attach/detach multiple multiple ips > to single address (trampoline), so for this approach we would need to add new > ftrace direct api that would allow to set multiple ips to multiple trampolines > within one call.. right > I was already checking on that and looks doable awesome. > another problem might be that this update function will need to be called with > all related trampoline locks, which in this case would be thousands sure. but these will be newly allocated trampolines and brand new mutexes, so no contention. But thousands of cmpxchg-s will take time. Would be good to measure though. It might not be that bad.