Re: [RFC bpf-next 02/13] libbpf: add btf__new_split_base_ref() creating split BTF with reference base BTF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 4 Apr 2024 15:49:10 -0700

On Thu, Apr 4, 2024 at 8:22 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On 29/03/2024 22:00, Andrii Nakryiko wrote:
> > On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >>
> >> To support more robust split BTF, adding supplemental context for the
> >> base BTF type ids that split BTF refers to is required.  Without such
> >> references, a simple shuffling of base BTF type ids (without any other
> >> significant change) invalidates the split BTF.  Here the attempt is made
> >> to store additional context to make split BTF more robust.
> >>
> >> This context comes in the form of "reference" base BTF - this base BTF
> >> constitutes the minimal BTF representation needed to disambiguate split BTF
> >> references to base BTF.  So for example if a struct is referred to from
> >> split -> base BTF, a forward representation of that struct is retained
> >
> > so you decided to not bother with recording expected size? Remember we
> > were talking about situations where vmlinux struct can be embedded
> > inside the kernel module's struct. In this case *exact* struct size is
> > very important and can't be changed without breaking compatibility.
> > Shouldn't we keep track of this?
> >
>
> There are two cases here I think. The embedded case - which is rare, but
> needs size tracking - and the non-embedded case, where a struct/union
> can potentially change size without invalidating the module's
> assumptions. In the former, your suggestion about having an empty
> (sized) struct instead of a forward would work well I think. The only
> question is around the non-embedded case. We could either use the same
> approach (but then ignore the size constraints during reconciliation) or
> continue to use FWDs. The advantage of continuing to use FWDs is we
> wouldn't have to compute struct embeddedness during reconciliation, and
> the BTF itself would be a bit more informative about the constraints
> (FWD means no constraints, empty sized struct means we care about size
> due to embeddedness). What do you think?

Sure. I was going to propose using size zero, but FWD works just as
fine and avoids the ambiguity of embedding zero-sized structs. The
important part is to check and enforce size where it's important.

>
> >> in reference base BTF, and split BTF refers to it.  Base types are kept
> >> intact in reference base BTF, and reference types like pointers are stored
> >> as-is.  Avoiding struct/union/enum/enum64 expansion is important to keep
> >> reference base BTF representation to a minimum size; however anonymous
> >> struct, union and enum[64] types are represented in full since type details
> >> are needed to disambiguate the reference - the name is not enough in those
> >> cases since there is no name.  In practice these are rare; in sample
> >> cases where reference base BTF was generated for in-tree kernel modules,
> >> only a few were needed in base reference BTF.  These represent the
> >> anonymous struct/unions that are used by the module but were de-duplicated
> >> to use base vmlinux BTF ids instead.
> >>

[...]

> >> +/* Create updated split BTF with "reference" base BTF; reference base BTF
> >> + * consists of BTF information required to clarify the types that split
> >> + * BTF refers to, omitting unneeded details.  Specifically it will contain
> >> + * base types and forward declarations of structs, unions and enumerated
> >> + * types, along with associated reference types like pointers, arrays etc.
> >> + *
> >> + * The only case where structs, unions or enumerated types are fully represented
> >> + * is when they are anonymous; in such cases, info about type content is needed
> >> + * to clarify type references.
> >> + *
> >> + * We return newly-created split BTF where the split BTf refers to a newly-created
> >> + * base reference BTF. Both must be freed separately by the caller.
> >> + *
> >> + * When creating the BTF representation for a module and provided with the
> >> + * base_ref option, pahole will create split BTF using this API, and store
> >> + * the base reference BTF in the .BTF.base.reference section.
> >> + */
> >> +struct btf *btf__new_split_base_ref(struct btf *src_btf)
> >
> > I find this name very confusing, for some reason. btf__new*/btf__parse
> > is generally used to either create a new empty instance, or just
> > parse/index pre-existing BTF information. In this case I feel like
> > it's actually a different operation, where we create a derived BTF, so
> > I'd name it distinctively. How do you feel about something with
> > "distill" terminology as an action performed here. E.g.,
> > btf__distill_base() or something like that?
> >
> > The API contract is also suboptimal, given back split BTF where base
> > BTF is owned (and has to be freed). Let's keep it a bit more explicit,
> > something like:
> >
> > int btf__distill_base(struct btf *new_base_btf, struct btf *new_split_btf);
> >
> > WDYT?
> >
>
> Yeah, any suggestions around naming are very welcome! My first choice
> was "minimal base" but that seems to prioritize the minimization, which
> is really just a side-effect of the desire to add context to the
> references in the split BTF. "base reference" feels clumsy to me too.
> Ideally we want to find a name that signals that the base BTF is a
> companion to the associated split BTF, adding context for it. Describing
> the operation as distilling certainly connotes that we're removing
> non-essential info; so we'd end up with
>
> int btf__distill_base(const struct btf *src_btf, struct btf
> **new_base_btf, struct btf **new_split_btf);
>
> ...and we could call the section .BTF.distilled.base perhaps.
>

We can get cute with analogies, but I'm sure people will hate it. :)

What do you get after you remove all the non-essential cruft
(distilling the essence)? A "nugget" of goodness... ;)

I'd keep it short as .BTF.base (and know that it's this "distilled"
base BTF, not a real one), but I don't feel very strongly about this.

>
>
> >> +{
> >> +       unsigned int n = btf__type_cnt(src_btf);
> >> +       const struct btf_type *t;
> >> +       struct btf_ref ref = {};
> >> +       struct btf *base_btf = NULL, *split_btf = NULL;
> >> +       __u32 i, id = 0;
> >> +       int ret = 0;
> >> +
> >> +       /* src BTF must be split BTF. */
> >> +       if (!btf__base_btf(src_btf)) {
> >> +               errno = -EINVAL;
> >> +               return NULL;
> >> +       }
> >> +       base_btf = btf__new_empty();
> >

[...]