On Thu, Apr 4, 2024 at 8:22 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 29/03/2024 22:00, Andrii Nakryiko wrote: > > On Fri, Mar 22, 2024 at 3:25 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >> > >> To support more robust split BTF, adding supplemental context for the > >> base BTF type ids that split BTF refers to is required. Without such > >> references, a simple shuffling of base BTF type ids (without any other > >> significant change) invalidates the split BTF. Here the attempt is made > >> to store additional context to make split BTF more robust. > >> > >> This context comes in the form of "reference" base BTF - this base BTF > >> constitutes the minimal BTF representation needed to disambiguate split BTF > >> references to base BTF. So for example if a struct is referred to from > >> split -> base BTF, a forward representation of that struct is retained > > > > so you decided to not bother with recording expected size? Remember we > > were talking about situations where vmlinux struct can be embedded > > inside the kernel module's struct. In this case *exact* struct size is > > very important and can't be changed without breaking compatibility. > > Shouldn't we keep track of this? > > > > There are two cases here I think. The embedded case - which is rare, but > needs size tracking - and the non-embedded case, where a struct/union > can potentially change size without invalidating the module's > assumptions. In the former, your suggestion about having an empty > (sized) struct instead of a forward would work well I think. The only > question is around the non-embedded case. We could either use the same > approach (but then ignore the size constraints during reconciliation) or > continue to use FWDs. The advantage of continuing to use FWDs is we > wouldn't have to compute struct embeddedness during reconciliation, and > the BTF itself would be a bit more informative about the constraints > (FWD means no constraints, empty sized struct means we care about size > due to embeddedness). What do you think? Sure. I was going to propose using size zero, but FWD works just as fine and avoids the ambiguity of embedding zero-sized structs. The important part is to check and enforce size where it's important. > > >> in reference base BTF, and split BTF refers to it. Base types are kept > >> intact in reference base BTF, and reference types like pointers are stored > >> as-is. Avoiding struct/union/enum/enum64 expansion is important to keep > >> reference base BTF representation to a minimum size; however anonymous > >> struct, union and enum[64] types are represented in full since type details > >> are needed to disambiguate the reference - the name is not enough in those > >> cases since there is no name. In practice these are rare; in sample > >> cases where reference base BTF was generated for in-tree kernel modules, > >> only a few were needed in base reference BTF. These represent the > >> anonymous struct/unions that are used by the module but were de-duplicated > >> to use base vmlinux BTF ids instead. > >> [...] > >> +/* Create updated split BTF with "reference" base BTF; reference base BTF > >> + * consists of BTF information required to clarify the types that split > >> + * BTF refers to, omitting unneeded details. Specifically it will contain > >> + * base types and forward declarations of structs, unions and enumerated > >> + * types, along with associated reference types like pointers, arrays etc. > >> + * > >> + * The only case where structs, unions or enumerated types are fully represented > >> + * is when they are anonymous; in such cases, info about type content is needed > >> + * to clarify type references. > >> + * > >> + * We return newly-created split BTF where the split BTf refers to a newly-created > >> + * base reference BTF. Both must be freed separately by the caller. > >> + * > >> + * When creating the BTF representation for a module and provided with the > >> + * base_ref option, pahole will create split BTF using this API, and store > >> + * the base reference BTF in the .BTF.base.reference section. > >> + */ > >> +struct btf *btf__new_split_base_ref(struct btf *src_btf) > > > > I find this name very confusing, for some reason. btf__new*/btf__parse > > is generally used to either create a new empty instance, or just > > parse/index pre-existing BTF information. In this case I feel like > > it's actually a different operation, where we create a derived BTF, so > > I'd name it distinctively. How do you feel about something with > > "distill" terminology as an action performed here. E.g., > > btf__distill_base() or something like that? > > > > The API contract is also suboptimal, given back split BTF where base > > BTF is owned (and has to be freed). Let's keep it a bit more explicit, > > something like: > > > > int btf__distill_base(struct btf *new_base_btf, struct btf *new_split_btf); > > > > WDYT? > > > > Yeah, any suggestions around naming are very welcome! My first choice > was "minimal base" but that seems to prioritize the minimization, which > is really just a side-effect of the desire to add context to the > references in the split BTF. "base reference" feels clumsy to me too. > Ideally we want to find a name that signals that the base BTF is a > companion to the associated split BTF, adding context for it. Describing > the operation as distilling certainly connotes that we're removing > non-essential info; so we'd end up with > > int btf__distill_base(const struct btf *src_btf, struct btf > **new_base_btf, struct btf **new_split_btf); > > ...and we could call the section .BTF.distilled.base perhaps. > We can get cute with analogies, but I'm sure people will hate it. :) What do you get after you remove all the non-essential cruft (distilling the essence)? A "nugget" of goodness... ;) I'd keep it short as .BTF.base (and know that it's this "distilled" base BTF, not a real one), but I don't feel very strongly about this. > > > >> +{ > >> + unsigned int n = btf__type_cnt(src_btf); > >> + const struct btf_type *t; > >> + struct btf_ref ref = {}; > >> + struct btf *base_btf = NULL, *split_btf = NULL; > >> + __u32 i, id = 0; > >> + int ret = 0; > >> + > >> + /* src BTF must be split BTF. */ > >> + if (!btf__base_btf(src_btf)) { > >> + errno = -EINVAL; > >> + return NULL; > >> + } > >> + base_btf = btf__new_empty(); > > [...]