On Thu, Jun 13, 2024 at 2:50 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > To support more robust split BTF, adding supplemental context for the > base BTF type ids that split BTF refers to is required. Without such > references, a simple shuffling of base BTF type ids (without any other > significant change) invalidates the split BTF. Here the attempt is made > to store additional context to make split BTF more robust. > > This context comes in the form of distilled base BTF providing minimal > information (name and - in some cases - size) for base INTs, FLOATs, > STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that > points at that base and contains any additional types needed (such as > TYPEDEF, PTR and anonymous STRUCT/UNION declarations). This > information constitutes the minimal BTF representation needed to > disambiguate or remove split BTF references to base BTF. The rules > are as follows: > > - INT, FLOAT, FWD are recorded in full. > - if a named base BTF STRUCT or UNION is referred to from split BTF, it > will be encoded as a zero-member sized STRUCT/UNION (preserving > size for later relocation checks). Only base BTF STRUCT/UNIONs > that are either embedded in split BTF STRUCT/UNIONs or that have > multiple STRUCT/UNION instances of the same name will _need_ size > checks at relocation time, but as it is possible a different set of > types will be duplicates in the later to-be-resolved base BTF, > we preserve size information for all named STRUCT/UNIONs. > - if an ENUM[64] is named, a ENUM forward representation (an ENUM > with no values) of the same size is used. > - in all other cases, the type is added to the new split BTF. > > Avoiding struct/union/enum/enum64 expansion is important to keep the > distilled base BTF representation to a minimum size. > > When successful, new representations of the distilled base BTF and new > split BTF that refers to it are returned. Both need to be freed by the > caller. > > So to take a simple example, with split BTF with a type referring > to "struct sk_buff", we will generate distilled base BTF with a > 0-member STRUCT sk_buff of the appropriate size, and the split BTF > will refer to it instead. > > Tools like pahole can utilize such split BTF to populate the .BTF > section (split BTF) and an additional .BTF.base section. Then > when the split BTF is loaded, the distilled base BTF can be used > to relocate split BTF to reference the current (and possibly changed) > base BTF. > > So for example if "struct sk_buff" was id 502 when the split BTF was > originally generated, we can use the distilled base BTF to see that > id 502 refers to a "struct sk_buff" and replace instances of id 502 > with the current (relocated) base BTF sk_buff type id. > > Distilled base BTF is small; when building a kernel with all modules > using distilled base BTF as a test, overall module size grew by only > 5.3Mb total across ~2700 modules. > > Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx> > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > --- > tools/lib/bpf/btf.c | 319 ++++++++++++++++++++++++++++++++++++++- > tools/lib/bpf/btf.h | 21 +++ > tools/lib/bpf/libbpf.map | 1 + > 3 files changed, 335 insertions(+), 6 deletions(-) > [...] > +/* Create updated split BTF with distilled base BTF; distilled base BTF > + * consists of BTF information required to clarify the types that split > + * BTF refers to, omitting unneeded details. Specifically it will contain > + * base types and memberless definitions of named structs, unions and enumerated > + * types. Associated reference types like pointers, arrays and anonymous > + * structs, unions and enumerated types will be added to split BTF. > + * Size is recorded for named struct/unions to help guide matching to the > + * target base BTF during later relocation. > + * > + * The only case where structs, unions or enumerated types are fully represented > + * is when they are anonymous; in such cases, the anonymous type is added to > + * split BTF in full. > + * > + * We return newly-created split BTF where the split BTF refers to a newly-created > + * distilled base BTF. Both must be freed separately by the caller. > + */ > +int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf, > + struct btf **new_split_btf) > +{ > + struct btf *new_base = NULL, *new_split = NULL; > + const struct btf *old_base; > + unsigned int n = btf__type_cnt(src_btf); > + struct btf_distill dist = {}; > + struct btf_type *t; > + int i, err = 0; > + > + /* src BTF must be split BTF. */ > + old_base = btf__base_btf(src_btf); > + if (!new_base_btf || !new_split_btf || !old_base) > + return libbpf_err(-EINVAL); > + > + new_base = btf__new_empty(); > + if (!new_base) > + return libbpf_err(-ENOMEM); > + dist.id_map = calloc(n, sizeof(*dist.id_map)); > + if (!dist.id_map) { > + err = -ENOMEM; > + goto done; > + } > + dist.pipe.src = src_btf; > + dist.pipe.dst = new_base; > + dist.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL); > + if (IS_ERR(dist.pipe.str_off_map)) { > + err = -ENOMEM; > + goto done; > + } > + dist.split_start_id = btf__type_cnt(old_base); > + dist.split_start_str = old_base->hdr->str_len; > + > + /* Pass over src split BTF; generate the list of base BTF type ids it > + * references; these will constitute our distilled BTF set to be > + * distributed over base and split BTF as appropriate. > + */ > + for (i = src_btf->start_id; i < n; i++) { > + err = btf_add_distilled_type_ids(&dist, i); > + if (err < 0) > + goto done; > + } > + /* Next add types for each of the required references to base BTF and split BTF > + * in turn. > + */ > + err = btf_add_distilled_types(&dist); > + if (err < 0) > + goto done; > + > + /* Create new split BTF with distilled base BTF as its base; the final > + * state is split BTF with distilled base BTF that represents enough > + * about its base references to allow it to be relocated with the base > + * BTF available. > + */ > + new_split = btf__new_empty_split(new_base); > + if (!new_split_btf) { Coverity points out that new_split_btf probably isn't what should be checked here. I think this was meant to be "new_split" here, is that right? Can you please send a quick fix? Thanks! > + err = -errno; > + goto done; > + } > + dist.pipe.dst = new_split; > + /* First add all split types */ > + for (i = src_btf->start_id; i < n; i++) { > + t = btf_type_by_id(src_btf, i); > + err = btf_add_type(&dist.pipe, t); > + if (err < 0) > + goto done; > + } > + /* Now add distilled types to split BTF that are not added to base. */ > + err = btf_add_distilled_types(&dist); > + if (err < 0) > + goto done; > + > + /* All split BTF ids will be shifted downwards since there are less base > + * BTF ids in distilled base BTF. > + */ > + dist.diff_id = dist.split_start_id - btf__type_cnt(new_base); > + > + n = btf__type_cnt(new_split); > + /* Now update base/split BTF ids. */ > + for (i = 1; i < n; i++) { > + err = btf_update_distilled_type_ids(&dist, i); > + if (err < 0) > + break; > + } > +done: > + free(dist.id_map); > + hashmap__free(dist.pipe.str_off_map); > + if (err) { > + btf__free(new_split); > + btf__free(new_base); > + return libbpf_err(err); > + } > + *new_base_btf = new_base; > + *new_split_btf = new_split; > + > + return 0; > +} [...]