Re: [RFC bpf-next 0/3] libbpf: Add support for aliased BPF programs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 3, 2024 at 1:19 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Sep 2, 2024 at 10:57 PM Viktor Malik <vmalik@xxxxxxxxxx> wrote:
> >
> > On 9/2/24 19:01, Alan Maguire wrote:
> > > On 02/09/2024 07:58, Viktor Malik wrote:
> > >> TL;DR
> > >>
> > >> This adds libbpf support for creating multiple BPF programs having the
> > >> same instructions using symbol aliases.
> > >>
> > >> Context
> > >> =======
> > >>
> > >> bpftrace has so-called "wildcarded" probes which allow to attach the
> > >> same program to multple different attach points. For k(u)probes, this is
> > >> easy to do as we can leverage k(u)probe_multi, however, other program
> > >> types (fentry/fexit, tracepoints) don't have such features.
> > >>
> > >> Currently, what bpftrace does is that it creates a copy of the program
> > >> for each attach point. This naturally results in a lot of redundant code
> > >> in the produced BPF object.
> > >>
> > >> Proposal
> > >> ========
> > >>
> > >> One way to address this problem would be to use *symbol aliases*. In
> > >> short, they allow to have multiple symbol table entries for the same
> > >> address. In bpftrace, we would create them using llvm::GlobalAlias. In
> > >> C, it can be achieved using compiler __attribute__((alias(...))):
> > >>
> > >>     int BPF_PROG(prog)
> > >>     {
> > >>         [...]
> > >>     }
> > >>     int prog_alias() __attribute__((alias("prog")));
> > >>
> > >> When calling bpf_object__open, libbpf is currently able to discover all
> > >> the programs and internally does a separate copy of the instructions for
> > >> each aliased program. What libbpf cannot do, is perform relocations b/c
> > >> it assumes that each instruction belongs to a single program only. The
> > >> second patch of this series changes relocation collection such that it
> > >> records relocations for each aliased program. With that, bpftrace can
> > >> emit just one copy of the full program and an alias for each target
> > >> attach point.
> > >>
> > >> For example, considering the following bpftrace script collecting the
> > >> number of hits of each VFS function using fentry over a one second
> > >> period:
> > >>
> > >>     $ bpftrace -e 'kfunc:vfs_* { @[func] = count() } i:s:1 { exit() }'
> > >>     [...]
> > >>
> > >> this change will allow to reduce the size of the in-memory BPF object
> > >> that bpftrace generates from 60K to 9K.

Tbh, I'm not too keen on adding this aliasing complication just for
this. It seems a bit too intrusive for something so obscure.

retsnoop doesn't need this and bypasses the issue by cloning with
bpf_prog_load(), can bpftrace follow the same model? If we need to add
some getters to bpf_program() to make this easier, that sounds like a
better long-term investment, API-wise.

> > >>
> > >> For reference, the bpftrace PoC is in [1].
> > >>
> > >> The advantage of this change is that for BPF objects without aliases, it
> > >> doesn't introduce any overhead.
> > >>
> > >
> > > A few high-level questions - apologies in advance if I'm missing the
> > > point here.
> > >
> > > Could bpftrace use program linking to solve this issue instead? So we'd
> > > have separate progs for the various attach points associated with vfs_*
> > > functions, but they would all call the same global function. That
> > > _should_ reduce the memory footprint of the object I think - or are
> > > there issues with doing that?
> >
> > That's a good suggestion, thanks! We added subprograms to bpftrace only
> > relatively recently so I didn't really think about this option. I'll
> > definitely give it a try as it could be even more efficient.
> >
> > > I also wonder if aliasing helps memory
> > > footprint fully, especially if we end up with separate copies of the
> > > program for relocation purposes; won't we have separate copies in-kernel
> > > then too? So I _think_ the memory utilization you're concerned about is
> > > not what's running in the kernel, but the BPF object representation in
> > > bpftrace; is that right?
> >
> > Yes, that is correct. libbpf will create a copy of the program for each
> > symbol in PROGBITS section that it discovers (including aliases) and the
> > copies will be loaded into kernel.
> >
> > It's mainly the footprint of the BPF object produced by bpftrace that I
> > was concerned about. (The reason is that we work on ahead-of-time
> > compilation so it will directly affect the size of the pre-compiled
> > binaries). But the above solution using global subprograms should reduce
> > the in-kernel footprint, too, so I'll try to add it and see if it would
> > work for bpftrace.
>
> I think it's a half solution, since prog will be duplicated anyway and
> loaded multiple times into the kernel.
>
> I think it's better to revive this patch sets:
>
> v1:
> https://lore.kernel.org/bpf/20240220035105.34626-1-dongmenglong.8@xxxxxxxxxxxxx/
>
> v2:
> https://lore.kernel.org/bpf/20240311093526.1010158-1-dongmenglong.8@xxxxxxxxxxxxx/
>
> Unfortunately it went off rails, because we went deep into trying
> to save bpf trampoline memory too.
>

+1 to adding multi-attach fentry, even if it will have to duplicate trampolines

> I think it can still land.
> We can load fentry program once and attach it in multiple places
> with many bpf links.
> That is still worth doing.

This part I'm confused about, but we can postpone discussion until
someone actually works on this.

>
> I think it should solve retsnoop and bpftrace use cases.
> Not perfect, since there will be many trampolines, but still an improvement.
>

retsnoop is fine already through bpf_prog_load()-based cloning, but
faster attachment for fentry/fexit would definitely help.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux