Re: [RFC bpf-next 09/13] libbpf: split BTF reconciliation

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 5 Apr 2024 12:58:16 -0700

On Fri, Apr 5, 2024 at 3:06 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On 29/03/2024 22:01, Andrii Nakryiko wrote:
> > On Fri, Mar 22, 2024 at 3:26 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >>
> >> Map base reference BTF type ids referenced in split BTF and their
> >> references to the base BTF passed in, and if the mapping succeeds,
> >> reparent the split BTF to the base BTF.
> >>
> >> Reconciliation rules are
> >>
> >> - base types must match exactly
> >> - enum[64] types should match all value name/value pairs, but the
> >>   to-be-reconciled enum[64] can also define additional name/value pairs
> >> - named fwds match to the correspondingly-named struct/union/enum/enum64
> >
> > yeah, but what about their size if they are embedded? Using FWD was a
> > nice trick, but it's not flexible enough for recording (optionally)
> > size... Probably emitting an empty (but named) struct/union/enum would
> > be a bit better (and actually make split BTF using base ref more valid
> > even without pre-processing).
> >
> >> - anon struct/unions must have field names/offsets specified in base
> >>   reference BTF matched by those in base BTF we are matching with
> >>
> >> Reconciliation can not recurse, since it will be used in-kernel also and
> >> we do not want to blow up the kernel stack when carrying out type
> >> compatibility checks.  Hence we use a stack for reference type
> >> reconciliation rather then recursive function calls.
> >>
> >> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
> >> ---
> >>  tools/lib/bpf/Build             |   2 +-
> >>  tools/lib/bpf/btf.c             |  58 ++++
> >>  tools/lib/bpf/btf.h             |   8 +
> >>  tools/lib/bpf/btf_reconcile.c   | 590 ++++++++++++++++++++++++++++++++
> >
> > how wrong would it be to call this process "relocate" rather than "reconcile"?
> >
>
> seems fine to me.
>
> >>  tools/lib/bpf/libbpf.map        |   1 +
> >>  tools/lib/bpf/libbpf_internal.h |   2 +
> >>  6 files changed, 660 insertions(+), 1 deletion(-)
> >>  create mode 100644 tools/lib/bpf/btf_reconcile.c
> >>
> >
> > [...]
> >
> >> +/* Find next type after *id in base BTF that matches kind of type t passed in
> >> + * and name (if it is specified).  Match fwd kinds to appropriate kind also.
> >> + */
> >> +static int btf_reconcile_find_next(struct btf_reconcile *r, const struct btf_type *t,
> >> +                                  __u32 *id, const struct btf_type **tp)
> >
> > I haven't grokked the whole patch logic just yet, doing a first pass,
> > so I might be asking stupid questions, sorry.
> >
> > But it looks like we have these linear searches here to find matching
> > types, is that right? Wouldn't it be better to build an index first to
> > speed up search?
> >
>
> It would, but the aim here was to keep things simple with an eye to
> sharing the code with the kernel. A lot of the libbpf hash stuff would
> be handy but then we'd have to have something on the kernel side. Given
> that the size of the base BTF is so small, the linear searches aren't
> much of a cost.

I didn't mean hashmap, I was thinking sort+binary search, but ok, we
can do that later as an optimization.

[...]

> >> +
> >> +/* Ensure each enum value in type t has equivalent in base BTF and that values (if any) match. */
> >> +static int btf_reconcile_enum(struct btf_reconcile *r, const char *name,
> >> +                             const struct btf_type *t, const struct btf_type *bt)
> >> +{
> >
> > should we care about compatibility between ENUM and ENUM64, they can
> > both represent the same values of the same size?
> >
>
> so do you mean if one representation uses an enum, another an enum64?
> Yep that might well be the case, I'll add that too.

yep, they are two different kinds, but they overlap in what they can
represent, so we try to support them interchangeably, if possible

[...]

> >> +                               if (err) {
> >> +                                       pr_warn("could not find base BTF type for base reference type[%u]\n",
> >> +                                               id);
> >> +                                       return err;
> >> +                               }
> >> +                       } else {
> >> +                               if (btf_reconcile_push(r, id) < 0 ||
> >> +                                   btf_reconcile_push(r, t->type) < 0)
> >> +                                       return -ENOSPC;
> >
> > I'm missing something, please help me understand. I don't get why we
> > need a recursive algorithm at all.
> >
> > In my mind, we have this small "base ref" set of types referenced from
> > module's BTF (split BTF part), right? So all we should need is to map
> > every type from base ref set to vmlinux BTF.
> >
> > What I don't yet fully get is why CONST/VOLATILE or PTR need to
> > postpone reconciliation via a queue. By the time we get to types in
> > split BTF all base ref types should be mapped, so all you need is to
> > remap t->type to resolved vmlinux BTF, no?
> >
>
> It's possible to have multiple layers of reference in the distilled base
> BTF though; for example here's a case from a module's distilled base BTF:
>
> [41] PTR '(anon)' type_id=42
> [42] CONST '(anon)' type_id=0
>
> To resolve type id 41 we need to resolve type id 42, and since type id 0
> already has a mapping, at that point we can look for CONSTs that refer
> to type id 0 and then once we've established that mapping we can find
> PTRs that have a t->type that refers to the const. So as described below
> we keep pushing type ids onto the stack until we find one with a t->type
> mapping to base BTF; once we hit that we can start looking in base BTF
> for types that have the mapped t->type value.
>

see below

> > I suspect the answer might have something to do with those anonymous
> > structs/unions which you copy verbatim into base ref BTF?
> >
>
> They do add a few more types, but we can get base BTF references that
> don't come from that source too. The above case PTR CONST void for
> example wasn't referred to via any other distilled base types.

I think it's too problematic to allow unnamed types in base reference
BTF. If we keep the rule that only named
structs/unions/typedefs/int/float kinds can be in base reference, then
everything becomes much simpler and faster, without breaking any of
kfunc/PTR_TO_BTF_ID usage.

The biggest problem is probably TYPEDEF pointing to anonymous
struct/union/func proto, so we might want to still record the shape of
expected underlying type, maybe, but we still go off of just a name in
the first place. Maybe initially we should just say that any ambiguity
would be rejected and keep only TYPEDEF with name?

Let me know if this is unrealistic, though.

>
> > But on the latter topic, I wonder if we at all need this? Why not keep
> > all those anon struct/union/enum in module's part of BTF? If they are
> > unnamed, I doubt they will ever be referenced from kfuncs or anything
> > like that, so their BTF ID isn't that important.
> >
>
> Changing the module BTF would make things a bit more complicated.

you mean appending those anonymous types from base BTF to module BTF?
It shouldn't be too hard, we just do it recursively and keep track of
ID remapping (so we don't add the same type twice). Or is there
something more?

> Currently we just update type id references and string offsets. The anon
> structs we end up with tend to be very small; from the same distilled
> base BTF used above here are the instances of struct/union:
>
> [35] STRUCT '(anon)' size=4 vlen=1
>         'counter' type_id=6 bits_offset=0
> [62] STRUCT '(anon)' size=8 vlen=1
>         'raw_lock' type_id=48 bits_offset=0
> [113] UNION '(anon)' size=8 vlen=2
>         'kernel' type_id=13 bits_offset=0
>         'user' type_id=13 bits_offset=0
> [114] STRUCT '(anon)' size=16 vlen=2
>         '(anon)' type_id=113 bits_offset=0
>         'is_kernel' type_id=11 bits_offset=64 bitfield_size=1
> [119] STRUCT '(anon)' size=8 vlen=1
>         'net' type_id=85 bits_offset=0
>
> These only added one type that wasn't referenced elsewhere - typedef
> arch_rwlock_t.
>
> > If base BTF is all named types, it would simplify the reconciliation
> > process significantly, I think.
> >
> > But again, I only skimmed the overall algorithm, sorry for my
> > laziness, but I figured it would be good to discuss the above first
> > anyways.
> >
>
> I'll try and walk through an example of how the algorithm proceeds; that
> might help make the approach concrete and we can see if it can be
> simplified.
>
> Consider split BTF that uses the base BTF type "int *", i.e. a PTR to an
> "int". It could do so in an anonymous struct/union, but also as an array
> member type or a FUNC_PROTO or VAR. For such a reference type, we first
> encounter the outer PTR. If its t->type has a mapping to base BTF "int"
> (it will), we look through the base BTF for reference types that match
> the kind (here a PTR) _and_ have the required t->type (int). Once we
> find the reference type in base BTF, we can add the mapping for PTR to
> int from distilled->base BTF id. So the reference type with one layer of
> reference is pretty straightforward.
>
> In the case where we don't have a mapping for a t->type - let's say PTR
> to PTR to int (int **) - we push the type id for PTR to PTR to int onto
> the stack along with the type id for t->type (PTR to int) after. So we
> will first pop PTR to int and go through the above-described type
> resolution. Next we will pop the PTR to PTR to int and because we've
> resolved PTR to int now, it's t->type will have a mapping and we can go
> through the search process to find a PTR that refers to a PTR to int
> in base BTF, and we then add that mapping too.

sure, pretty standard dfs with memoization, makes sense

what makes me a bit uncomfortable is that if you have PTR -> STRUCT,
once you found match for STRUCT, you'll go do a linear search for any
PTR  to that STRUCT. Which starts to sound like BTF deduplication
(though admittedly matching STRUCT by name removes half of BTF dedup
complexity) and the way you are doing it right now with linear search
over vmlinux BTF is going to be slow (which is why I was proposing an
index).

Ok, anyways, it's probably going to work. If we can simplify further
with keeping just named types, it would be great. If not, so be it, I
suppose.

>
> For an array we push the member and index types, a func proto the return
> type and parameter types, etc.
>
> So once multiple layers of reference are part of the picture I _think_
> we need something like this approach.
>
> Alan
>
> >> +                       }
> >> +                       break;
> >
> > [...]