On Tue, Sep 3, 2024 at 1:19 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Sep 2, 2024 at 10:57 PM Viktor Malik <vmalik@xxxxxxxxxx> wrote: > > > > On 9/2/24 19:01, Alan Maguire wrote: > > > On 02/09/2024 07:58, Viktor Malik wrote: > > >> TL;DR > > >> > > >> This adds libbpf support for creating multiple BPF programs having the > > >> same instructions using symbol aliases. > > >> > > >> Context > > >> ======= > > >> > > >> bpftrace has so-called "wildcarded" probes which allow to attach the > > >> same program to multple different attach points. For k(u)probes, this is > > >> easy to do as we can leverage k(u)probe_multi, however, other program > > >> types (fentry/fexit, tracepoints) don't have such features. > > >> > > >> Currently, what bpftrace does is that it creates a copy of the program > > >> for each attach point. This naturally results in a lot of redundant code > > >> in the produced BPF object. > > >> > > >> Proposal > > >> ======== > > >> > > >> One way to address this problem would be to use *symbol aliases*. In > > >> short, they allow to have multiple symbol table entries for the same > > >> address. In bpftrace, we would create them using llvm::GlobalAlias. In > > >> C, it can be achieved using compiler __attribute__((alias(...))): > > >> > > >> int BPF_PROG(prog) > > >> { > > >> [...] > > >> } > > >> int prog_alias() __attribute__((alias("prog"))); > > >> > > >> When calling bpf_object__open, libbpf is currently able to discover all > > >> the programs and internally does a separate copy of the instructions for > > >> each aliased program. What libbpf cannot do, is perform relocations b/c > > >> it assumes that each instruction belongs to a single program only. The > > >> second patch of this series changes relocation collection such that it > > >> records relocations for each aliased program. With that, bpftrace can > > >> emit just one copy of the full program and an alias for each target > > >> attach point. > > >> > > >> For example, considering the following bpftrace script collecting the > > >> number of hits of each VFS function using fentry over a one second > > >> period: > > >> > > >> $ bpftrace -e 'kfunc:vfs_* { @[func] = count() } i:s:1 { exit() }' > > >> [...] > > >> > > >> this change will allow to reduce the size of the in-memory BPF object > > >> that bpftrace generates from 60K to 9K. Tbh, I'm not too keen on adding this aliasing complication just for this. It seems a bit too intrusive for something so obscure. retsnoop doesn't need this and bypasses the issue by cloning with bpf_prog_load(), can bpftrace follow the same model? If we need to add some getters to bpf_program() to make this easier, that sounds like a better long-term investment, API-wise. > > >> > > >> For reference, the bpftrace PoC is in [1]. > > >> > > >> The advantage of this change is that for BPF objects without aliases, it > > >> doesn't introduce any overhead. > > >> > > > > > > A few high-level questions - apologies in advance if I'm missing the > > > point here. > > > > > > Could bpftrace use program linking to solve this issue instead? So we'd > > > have separate progs for the various attach points associated with vfs_* > > > functions, but they would all call the same global function. That > > > _should_ reduce the memory footprint of the object I think - or are > > > there issues with doing that? > > > > That's a good suggestion, thanks! We added subprograms to bpftrace only > > relatively recently so I didn't really think about this option. I'll > > definitely give it a try as it could be even more efficient. > > > > > I also wonder if aliasing helps memory > > > footprint fully, especially if we end up with separate copies of the > > > program for relocation purposes; won't we have separate copies in-kernel > > > then too? So I _think_ the memory utilization you're concerned about is > > > not what's running in the kernel, but the BPF object representation in > > > bpftrace; is that right? > > > > Yes, that is correct. libbpf will create a copy of the program for each > > symbol in PROGBITS section that it discovers (including aliases) and the > > copies will be loaded into kernel. > > > > It's mainly the footprint of the BPF object produced by bpftrace that I > > was concerned about. (The reason is that we work on ahead-of-time > > compilation so it will directly affect the size of the pre-compiled > > binaries). But the above solution using global subprograms should reduce > > the in-kernel footprint, too, so I'll try to add it and see if it would > > work for bpftrace. > > I think it's a half solution, since prog will be duplicated anyway and > loaded multiple times into the kernel. > > I think it's better to revive this patch sets: > > v1: > https://lore.kernel.org/bpf/20240220035105.34626-1-dongmenglong.8@xxxxxxxxxxxxx/ > > v2: > https://lore.kernel.org/bpf/20240311093526.1010158-1-dongmenglong.8@xxxxxxxxxxxxx/ > > Unfortunately it went off rails, because we went deep into trying > to save bpf trampoline memory too. > +1 to adding multi-attach fentry, even if it will have to duplicate trampolines > I think it can still land. > We can load fentry program once and attach it in multiple places > with many bpf links. > That is still worth doing. This part I'm confused about, but we can postpone discussion until someone actually works on this. > > I think it should solve retsnoop and bpftrace use cases. > Not perfect, since there will be many trampolines, but still an improvement. > retsnoop is fine already through bpf_prog_load()-based cloning, but faster attachment for fentry/fexit would definitely help.