Re: [RFC PATCH bpf-next 10/17] bpf: Add support to attach program to multiple trampolines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 25, 2022 at 10:44 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Aug 25, 2022 at 9:08 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >
> > On Tue, Aug 23, 2022 at 06:22:37PM -0700, Alexei Starovoitov wrote:
> > > On Mon, Aug 08, 2022 at 04:06:19PM +0200, Jiri Olsa wrote:
> > > > Adding support to attach program to multiple trampolines
> > > > with new attach/detach interface:
> > > >
> > > >   int bpf_trampoline_multi_attach(struct bpf_tramp_prog *tp,
> > > >                                   struct bpf_tramp_id *id)
> > > >   int bpf_trampoline_multi_detach(struct bpf_tramp_prog *tp,
> > > >                                   struct bpf_tramp_id *id)
> > > >
> > > > The program is passed as bpf_tramp_prog object and trampolines to
> > > > attach it to are passed as bpf_tramp_id object.
> > > >
> > > > The interface creates new bpf_trampoline object which is initialized
> > > > as 'multi' trampoline and stored separtely from standard trampolines.
> > > >
> > > > There are following rules how the standard and multi trampolines
> > > > go along:
> > > >   - multi trampoline can attach on top of existing single trampolines,
> > > >     which creates 2 types of function IDs:
> > > >
> > > >       1) single-IDs - functions that are attached within existing single
> > > >          trampolines
> > > >       2) multi-IDs  - functions that were 'free' and are now taken by new
> > > >          'multi' trampoline
> > > >
> > > >   - we allow overlapping of 2 'multi' trampolines if they are attached
> > > >     to same IDs
> > > >   - we do now allow any other overlapping of 2 'multi' trampolines
> > > >   - any new 'single' trampoline cannot attach to existing multi-IDs IDs.
> > > >
> > > > Maybe better explained on following example:
> > > >
> > > >    - you want to attach program P to functions A,B,C,D,E,F
> > > >      via bpf_trampoline_multi_attach
> > > >
> > > >    - D,E,F already have standard trampoline attached
> > > >
> > > >    - the bpf_trampoline_multi_attach will create new 'multi' trampoline
> > > >      which spans over A,B,C functions and attach program P to single
> > > >      trampolines D,E,F
> > > >
> > > >    - A,B,C functions are now 'not attachable' by any trampoline
> > > >      until the above 'multi' trampoline is released
> > >
> > > This restriction is probably too severe.
> > > Song added support for trampoline and KLPs to co-exist on the same function.
> > > This multi trampoline restriction will resurface the same issue.
> > > afiak this restriction is only because multi trampoline image
> > > is the same for A,B,C. This memory optimization is probably going too far.
> > > How about we keep existing logic of one tramp image per function.
> > > Pretend that multi-prog P matches BTF of the target function,
> > > create normal tramp for it and attach prog P there.
> > > The prototype of P allows six u64. The args are potentially rearding
> > > garbage, but there are no safety issues, since multi progs don't know BTF types.
> > >
> > > We still need sinle bpf_link_multi to contain btf_ids of all functions,
> > > but it can point to many bpf tramps. One for each attach function.
> > >
> > > iirc we discussed something like this long ago, but I don't remember
> > > why we didn't go that route.
> > > arch_prepare_bpf_trampoline is fast.
> > > bpf_tramp_image_alloc is fast too.
> > > So attaching one multi-prog to thousands of btf_id-s should be fast too.
> > > The destroy part is interesting.
> > > There we will be doing thousands of bpf_tramp_image_put,
> > > but it's all async now. We used to have synchronize_rcu() which could
> > > be the reason why this approach was slow.
> > > Or is this unregister_fentry that slows it down?
> > > But register_ftrace_direct_multi() interface should have solved it
> > > for both register and unregister?
> >
> > I think it's the synchronize_rcu_tasks at the end of each ftrace update,
> > that's why we added un/register_ftrace_direct_multi that makes the changes
> > for multiple ips and syncs once at the end
>
> hmm. Can synchronize_rcu_tasks be made optional?
> For ftrace_direct that points to bpf tramps is it really needed?
>
> > un/register_ftrace_direct_multi will attach/detach multiple multiple ips
> > to single address (trampoline), so for this approach we would need to add new
> > ftrace direct api that would allow to set multiple ips to multiple trampolines
> > within one call..
>
> right
>
> > I was already checking on that and looks doable
>
> awesome.
>
> > another problem might be that this update function will need to be called with
> > all related trampoline locks, which in this case would be thousands
>
> sure. but these will be newly allocated trampolines and
> brand new mutexes, so no contention.

I guess we still need to lock existing tr->mutex in some cases? Say, we
have 3 functions, A, B, C, and A already have tr_A. If we want to attach
tr_multi for all three, we still need to lock tr_A->mutex, no?

Thanks,
Song

> But thousands of cmpxchg-s will take time. Would be good to measure
> though. It might not be that bad.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux