Re: [PATCH bpf-next 0/2] libbpf: Implement BTF Generator API

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 2 Nov 2021 10:12:28 -0700

On Tue, Nov 2, 2021 at 3:59 AM Leonardo Di Donato <leodidonato@xxxxxxxxx> wrote:
>
> On Tue, Nov 2, 2021 at 6:55 AM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > No, now that I understand what exactly you are doing, it won't work.
> >
> > But ok, I gave it quite a bit of thought and tried to find a good
> > compromise between completely exposing all the libbpf internals as
> > public APIs (which is a huge price I'm not willing to accept) and
> > actually allowing you to achieve your goal (which I think is worthy to
> > achieve).
> >
> > But first. https://github.com/aquasecurity/btfhub/tree/main/tools is
> > awesome. Great work explaining a lot at exactly the right level of
> > technical details. It would be great if you published it as a
> > dedicated blog post, maybe splitting the more general information from
> > the BTF minimization problem. Both are useful, but it's too long for
> > one article. Great job, really!
> >
> > Now, to the problem at hand. And sorry for a long reply, but there is
> > quite a bit of things to unpack.
> >
> > I see this overall problem as two distinct problems:
> >   1. Knowing which fields and types libbpf is using from kernel BTF.
> > Basically, observe CO-RE logic from outside.
> >   2. Knowing information from #1, minimize BTF.
> >
> > Part #2 absolutely doesn't belong in libbpf. Libbpf exposes enough BTF
> > constructing APIs to implement this in any application, bpftool or
> > otherwise. It's also a relatively straightforward problem: mark used
> > types and fields, create a copy of BTF with only those types and
> > fields.
> >
> > So let's talk about #1, because I agree that it's extremely painful to
> > have to reimplement most of CO-RE logic just for getting the list of
> > types and fields. Here we have two sub-problems (assuming we let
> > libbpf do CO-RE relocation logic for us):
> >   a) perform CO-RE relocations but don't create BPF maps and load BPF
> > programs. Dry-run of sorts.
> >   b) exposing calculated relocation information.
> >
> > First, 1a. The problem right now is that CO-RE relocations (and
> > relocations in general) happen in the same bpf_object__load() phase
> > and it's not possible to do them without creating maps and
> > subsequently loading BPF programs first. This is very suboptimal. I've
> > actually been thinking in the background how to improve that
> > situation, because even with the recent bpf_program__insns() API,
> > added a few days ago, you still have to load the BPF program to be
> > able to clone the BPF program, and so I wanted to solve that for a
> > long while now.
> >
> > So how about we split bpf_object__load() phase into two:
> > bpf_object__prepare() and bpf_object__load(). prepare() will do all
> > the preparations (subprogs, ELF relos, also almost complete BPF map
> > relos, but not quite; I'll get to this), basically everything that
> > load does today short of actually creating maps and progs. load() then
> > will ensure that bpf_object__prepare() was called (i.e., user can do
> > just prepare(), prepare() + load() explicitly, or load() which will
> > call prepare() implicitly).
> >
> > The biggest problem I see right now is what we do about BPF map
> > relocations. I propose to perform map relocations but substitute map's
> > internal index (so that if someone dumps prog instructions after
> > prepare(), it's still meaningful, even if not yet validatable by
> > verifier). After maps are actually created, we can do another quick
> > pass over just RELO_DATA and replace map_idx with map's fd.
> >
> > It feels like we should split load further into two steps: creating
> > and pinning maps (+ finalizing FDs in instructions) and actually
> > loading programs. Again, with the same implicit calling of prepare and
> > map creation steps if the user only calls bpf_object__load(). For ease
> > of use and backwards compatibility.
> >
> > So basically, the most granular set of steps would be:
> >   1. bpf_object__open()
> >   2. bpf_object__prepare() (or bpf_object__relocate(), naming is hard)
> >   3. bpf_object__create_maps();
> >   4. bpf_object__load().
> >
> > But the old and trusty bpf_object__open() + bpf_object__load() will
> > work just as well, with load() doing steps #2-#4 automatically, if
> > necessary.
> >
> > So what does that split gives us. Few things:
> >   - it's possible to "simulate" almost all libbpf logic without
> > changing the state of the system (no maps or progs created). Yet you
> > still validate that kconfig externs, per-cpu externs, CO-RE relos, map
> > relos, etc, all that works.
> >   - libbpf can store extra information between steps 1, 2, 3, and 4,
> > but after step #4 all that extra information can be discarded and
> > cleaned up. So advanced users will get access to stuff like
> > bpf_program__insns() and CO-RE information, but most users won't have
> > to pay for that because it will be freed after bpf_object__load(). So
> > in this case, bpf_program__insns() could (should?) be discarded after
> > actual load, because if you care about instructions, you can do steps
> > #1-#3, grab instructions and copy them, if necessary. Then proceed to
> > #4 (or not) and free the memory.
> >
> > The last point is important, because to solve the problem 1b (exposing
> > CO-RE relo info), the best way to minimize public API commitments is
> > to (optionally, probably) request libbpf to record its CO-RE relo
> > decisions. Here's what I propose, specifically:
> >   1. Add something like "bool record_core_relo_info" (awful name,
> > don't use it) in bpf_object_open_opts.
> >   2. If it is set to true, libbpf will keep a "log" of CO-RE
> > relocation decisions, recording stuff like program name, instruction
> > index, local spec (i.e., root_type_id, spec string, relo kind, maybe
> > something else), target spec (kernel type_id, kernel spec string, also
> > module ID, if it's not vmlinux BTF). We can also record relocated
> > value (i.e., field offset, actual enum value, true/false for
> > existence, etc). All these are stable concepts, so I'd feel more
> > comfortable exposing them, compared to stuff like bpf_core_accessor
> > and other internal details.
> >   3. The memory for all that will be managed by libbpf for simplicity
> > of an API, and we'll expose accessors to get those arrays (at object
> > level or per-program level is TBD).
> >   4. This info will be available after the prepare() step and will be
> > discarded either at create_maps() or load().
> >
> > I think that solves problem #1 completely (at least for BTFGen case)
> > and actually provides more useful information. E.g., if someone wants
> > to track CO-RE resolution logic for some other reason, they should
> > have pretty full information (BTFGen doesn't need all of that).
> >
> > It also doesn't feel like too much complication to libbpf internals
> > (even though we'll need to be very careful with BPF map relos and
> > double-checking that we haven't missed any other critical part of the
> > process). And for most users there is no change in API or behavior.
> > And given this gives us a "framework" for more sustainable libbpf
> > "observability", I think it's a justified price to pay, overall.
> >
> > I still need to sleep on this, but this feels like a bit cleaner way
> > forward. Thoughts are welcome.
> >
> > >
> > > > If CO-RE matching style is necessary and it's the best approach then please
> > > > add new logic to bpftool. I'm not sure such api would be
> > > > useful beyond this particular case to expose as stable libbpf api.
> > >
> > > I agree 100%. Our goal is to have this available on bpftool so all the
> > > community can use it. However it'd be a shame if we can't use some of
> > > the existing logic in libbpf.
>
> Hello Andrii,
>
> I was experimenting on implementing this during the last few days by
> using the preprocessor mechanism builtin in libbpf.
>
> The bpf_object__load_progs (which happens after bpf_object__relocate)
> calls bpf_object__load_progs, which calls bpf_program__load that, for
> each program instance, calls a bpf_program_prep_t preprocessor. Such a
> callback gets passed in the current program instance, its index, its
> insn, etc.
> The bpf_object__load_progs function gets called just after
> bpf_object__relocate (in bpf_object__load_xattr) which finds relo
> candidates and applies them.
> But I guess you already know all this.
>
> So my idea was to store  the relocation info into the program
> instances, exposing them to the userspace implementation of the
> aforementioned callback.
> At that point, creating a tailored BTF for the program/s would be just
> a matter of implementing the logic for grabbing and saving them to the
> disk.
>
> Would you think this would be feasible? I think this would be a good
> use case for the preprocessor.

This preprocessor API is deprecated and is going to be removed in
libbpf 1.0. See [0].

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20211025224531.1088894-4-andrii@xxxxxxxxxx/

>
> L.