Re: [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 21 May 2024 14:48:08 -0700

On Fri, May 17, 2024 at 3:23 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> To support more robust split BTF, adding supplemental context for the
> base BTF type ids that split BTF refers to is required.  Without such
> references, a simple shuffling of base BTF type ids (without any other
> significant change) invalidates the split BTF.  Here the attempt is made
> to store additional context to make split BTF more robust.
>
> This context comes in the form of distilled base BTF providing minimal
> information (name and - in some cases - size) for base INTs, FLOATs,
> STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
> points at that base and contains any additional types needed (such as
> TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
> information constitutes the minimal BTF representation needed to
> disambiguate or remove split BTF references to base BTF.  The rules
> are as follows:
>
> - INT, FLOAT are recorded in full.
> - if a named base BTF STRUCT or UNION is referred to from split BTF, it
>   will be encoded either as a zero-member sized STRUCT/UNION (preserving
>   size for later relocation checks) or as a named FWD.  Only base BTF
>   STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or
>   that have multiple STRUCT/UNION instances of the same name need to
>   preserve size information, so a FWD representation will be used in
>   most cases.
> - if an ENUM[64] is named, a ENUM forward representation (an ENUM
>   with no values) is used.
> - in all other cases, the type is added to the new split BTF.
>
> Avoiding struct/union/enum/enum64 expansion is important to keep the
> distilled base BTF representation to a minimum size.
>
> When successful, new representations of the distilled base BTF and new
> split BTF that refers to it are returned.  Both need to be freed by the
> caller.
>
> So to take a simple example, with split BTF with a type referring
> to "struct sk_buff", we will generate distilled base BTF with a
> FWD struct sk_buff, and the split BTF will refer to it instead.
>
> Tools like pahole can utilize such split BTF to populate the .BTF
> section (split BTF) and an additional .BTF.base section.  Then
> when the split BTF is loaded, the distilled base BTF can be used
> to relocate split BTF to reference the current (and possibly changed)
> base BTF.
>
> So for example if "struct sk_buff" was id 502 when the split BTF was
> originally generated,  we can use the distilled base BTF to see that
> id 502 refers to a "struct sk_buff" and replace instances of id 502
> with the current (relocated) base BTF sk_buff type id.
>
> Distilled base BTF is small; when building a kernel with all modules
> using distilled base BTF as a test, ovreall module size grew by only

typo: overall

> 5.3Mb total across ~2700 modules.
>
> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
> ---
>  tools/lib/bpf/btf.c      | 409 ++++++++++++++++++++++++++++++++++++++-
>  tools/lib/bpf/btf.h      |  20 ++
>  tools/lib/bpf/libbpf.map |   1 +
>  3 files changed, 424 insertions(+), 6 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..953929d196c3 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
>         return 0;
>  }
>
> -int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
> +static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
>  {
> -       struct btf_pipe p = { .src = src_btf, .dst = btf };
>         struct btf_type *t;
>         int sz, err;
>
> @@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
>                 return libbpf_err(sz);
>
>         /* deconstruct BTF, if necessary, and invalidate raw_data */
> -       if (btf_ensure_modifiable(btf))
> +       if (btf_ensure_modifiable(p->dst))
>                 return libbpf_err(-ENOMEM);
>
> -       t = btf_add_type_mem(btf, sz);
> +       t = btf_add_type_mem(p->dst, sz);
>         if (!t)
>                 return libbpf_err(-ENOMEM);
>
>         memcpy(t, src_type, sz);
>
> -       err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
> +       err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
>         if (err)
>                 return libbpf_err(err);
>
> -       return btf_commit_type(btf, sz);
> +       return btf_commit_type(p->dst, sz);
> +}
> +
> +int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
> +{
> +       struct btf_pipe p = { .src = src_btf, .dst = btf };
> +
> +       return btf_add_type(&p, src_type);
>  }
>
>  static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
> @@ -5212,3 +5218,394 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
>
>         return 0;
>  }
> +
> +#define BTF_NEEDS_SIZE (1 << 31)       /* flag set if either struct/union is
> +                                        * embedded - and thus size info must
> +                                        * be preserved - or if there are
> +                                        * multiple instances of the same
> +                                        * struct/union - where size can be
> +                                        * used to clarify which is wanted.
> +                                        */

please use full line comment in front of #define

> +#define BTF_ID(id)             (id & ~BTF_NEEDS_SIZE)
> +
> +struct btf_distill {
> +       struct btf_pipe pipe;
> +       int *ids;

reading the rest of the code, this BTF_NEEDS_SIZE and BTF_ID() macro
use was quite distracting. I'm wondering if it would be better to use
a simple struct with bitfields here, e.g.,

struct type_state {
    int id: 31;
    bool needs_size;
};

struct dist_state *states;

Same memory usage, same efficiency, but more readable code. WDYT?

> +       unsigned int split_start_id;
> +       unsigned int split_start_str;
> +       unsigned int diff_id;
> +};
> +
> +/* Check if a member of a split BTF struct/union refers to a base BTF

nit: comments uses "check" terminology, function name uses "find",
while really it "marks" some time as embedded... Let's use consistent
terminology (where mark seems like the most fitting, IMO)

> + * struct/union.  Members can be const/restrict/volatile/typedef
> + * reference types, but if a pointer is encountered, type is no longer
> + * considered embedded.
> + */
> +static int btf_find_embedded_composite_type_ids(__u32 *id, void *ctx)
> +{
> +       struct btf_distill *dist = ctx;
> +       const struct btf_type *t;
> +       __u32 next_id = *id;
> +
> +       do {
> +               if (next_id == 0)
> +                       return 0;
> +               t = btf_type_by_id(dist->pipe.src, next_id);
> +               switch (btf_kind(t)) {
> +               case BTF_KIND_CONST:
> +               case BTF_KIND_RESTRICT:
> +               case BTF_KIND_VOLATILE:
> +               case BTF_KIND_TYPEDEF:
> +               case BTF_KIND_TYPE_TAG:
> +                       next_id = t->type;
> +                       break;
> +               case BTF_KIND_ARRAY: {
> +                       struct btf_array *a = btf_array(t);
> +
> +                       next_id = a->type;
> +                       break;
> +               }
> +               case BTF_KIND_STRUCT:
> +               case BTF_KIND_UNION:
> +                       dist->ids[next_id] |= BTF_NEEDS_SIZE;
> +                       return 0;
> +               default:
> +                       return 0;
> +               }
> +
> +       } while (1);

nit: while (true)

> +
> +       return 0;
> +}
> +
> +/* Check if composite type has a duplicate-named type; if it does, retain

see above about check vs mark, here at least the function name uses "mark" :)

> + * size information to help guide later relocation towards the correct type.
> + * For example there are duplicate 'dma_chan' structs in vmlinux BTF;
> + * one is size 112, the other 16.  Having size information allows relocation
> + * to choose the right one.

re: this dma_chan and similar cases. Is it ever a problem where a
module actually needs two dma_chan in distilled base BTF? Name
conflicts do happen, but intuitively I'd expect this to happen between
some vmlinux-internal (so to speak, i.e., not the type used in
exported functions/types) and kernel module-specific types. It would
be awkward for the same module to use two different types that are
named the same.

Have you seen this as a problem in practice? What if for now we just
error out if there are two conflicting types required in distilled
BTF?

> + */
> +static int btf_mark_composite_dups(struct btf_distill *dist, __u32 id)
> +{
> +       __u8 *cnt = calloc(dist->split_start_str, sizeof(__u8));

nit: we generally follow that initialization of variable shouldn't be
doing non-trivial operations. So let's do calloc() as a separate
statement outside of variable declaration.

> +       struct btf_type *t;
> +       int i;
> +
> +       if (!cnt)
> +               return -ENOMEM;
> +
> +       /* First pass; collect name counts for composite types. */
> +       for (i = 1; i < dist->split_start_id; i++) {
> +               t = btf_type_by_id(dist->pipe.src, i);
> +               if (!btf_is_composite(t) || !t->name_off)
> +                       continue;
> +               if (cnt[t->name_off] < 255)
> +                       cnt[t->name_off]++;
> +       }
> +       /* Second pass; mark composite types with multiple instances of the
> +        * same name as needing size information.
> +        */
> +       for (i = 1; i < dist->split_start_id; i++) {
> +               /* id not needed or is already preserving size information */
> +               if (!dist->ids[i] || (dist->ids[i] & BTF_NEEDS_SIZE))
> +                       continue;
> +               t = btf_type_by_id(dist->pipe.src, i);
> +               if (!btf_is_composite(t) || !t->name_off)
> +                       continue;
> +               if (cnt[t->name_off] > 1)
> +                       dist->ids[i] |= BTF_NEEDS_SIZE;
> +       }
> +       free(cnt);
> +
> +       return 0;
> +}
> +
> +static bool btf_is_eligible_named_fwd(const struct btf_type *t)
> +{
> +       return (btf_is_composite(t) || btf_is_any_enum(t)) && t->name_off != 0;
> +}
> +
> +static int btf_add_distilled_type_ids(__u32 *id, void *ctx)
> +{
> +       struct btf_distill *dist = ctx;
> +       struct btf_type *t = btf_type_by_id(dist->pipe.src, *id);
> +       int err;
> +
> +       if (!*id)
> +               return 0;
> +       /* split BTF id, not needed */
> +       if (*id >= dist->split_start_id)
> +               return 0;
> +       /* already added ? */
> +       if (BTF_ID(dist->ids[*id]) > 0)
> +               return 0;
> +
> +       /* only a subset of base BTF types should be referenced from split
> +        * BTF; ensure nothing unexpected is referenced.
> +        */
> +       switch (btf_kind(t)) {
> +       case BTF_KIND_INT:
> +       case BTF_KIND_FLOAT:
> +       case BTF_KIND_FWD:
> +       case BTF_KIND_ARRAY:
> +       case BTF_KIND_STRUCT:
> +       case BTF_KIND_UNION:
> +       case BTF_KIND_TYPEDEF:
> +       case BTF_KIND_ENUM:
> +       case BTF_KIND_ENUM64:
> +       case BTF_KIND_PTR:
> +       case BTF_KIND_CONST:
> +       case BTF_KIND_RESTRICT:
> +       case BTF_KIND_VOLATILE:
> +       case BTF_KIND_FUNC_PROTO:
> +       case BTF_KIND_TYPE_TAG:
> +               dist->ids[*id] |= *id;
> +               break;
> +       default:
> +               pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n",
> +                       *id, btf_kind(t));
> +               return -EINVAL;
> +       }
> +
> +       /* struct/union members not needed, except for anonymous structs
> +        * and unions, which we need since name won't help us determine
> +        * matches; so if a named struct/union, no need to recurse
> +        * into members.
> +        */

is this an outdated comment? we shouldn't have anonymous types in the
distilled base, right?

> +       if (btf_is_eligible_named_fwd(t))
> +               return 0;
> +
> +       /* ensure references in type are added also. */
> +       err = btf_type_visit_type_ids(t, btf_add_distilled_type_ids, ctx);
> +       if (err < 0)
> +               return err;
> +       return 0;

nit: could be just `return btf_type_visit_type_ids(...);`?

> +}
> +
> +static int btf_add_distilled_types(struct btf_distill *dist)
> +{
> +       bool adding_to_base = dist->pipe.dst->start_id == 1;
> +       int id = btf__type_cnt(dist->pipe.dst);
> +       struct btf_type *t;
> +       int i, err = 0;
> +
> +       /* Add types for each of the required references to either distilled
> +        * base or split BTF, depending on type characteristics.
> +        */
> +       for (i = 1; i < dist->split_start_id; i++) {
> +               const char *name;
> +               int kind;
> +
> +               if (!BTF_ID(dist->ids[i]))
> +                       continue;
> +               t = btf_type_by_id(dist->pipe.src, i);
> +               kind = btf_kind(t);
> +               name = btf__name_by_offset(dist->pipe.src, t->name_off);
> +
> +               /* Named int, float, fwd struct, union, enum[64] are added to
> +                * base; everything else is added to split BTF.
> +                */
> +               switch (kind) {
> +               case BTF_KIND_INT:
> +               case BTF_KIND_FLOAT:
> +               case BTF_KIND_FWD:
> +               case BTF_KIND_STRUCT:
> +               case BTF_KIND_UNION:
> +               case BTF_KIND_ENUM:
> +               case BTF_KIND_ENUM64:
> +                       if ((adding_to_base && !t->name_off) || (!adding_to_base && t->name_off))
> +                               continue;
> +                       break;
> +               default:
> +                       if (adding_to_base)
> +                               continue;
> +                       break;
> +               }
> +               if (dist->ids[i] & BTF_NEEDS_SIZE) {
> +                       /* If a named struct/union in base BTF is referenced as a type
> +                        * in split BTF without use of a pointer - i.e. as an embedded
> +                        * struct/union - add an empty struct/union preserving size
> +                        * since size must be consistent when relocating split and
> +                        * possibly changed base BTF.  Similarly, when a struct/union
> +                        * has multiple instances of the same name in the original
> +                        * base BTF, retain size to help relocation later pick the
> +                        * right struct/union.
> +                        */
> +                       err = btf_add_composite(dist->pipe.dst, kind, name, t->size);
> +               } else if (btf_is_eligible_named_fwd(t)) {
> +                       /* If not embedded, use a fwd for named struct/unions since we
> +                        * can match via name without any other details.
> +                        */
> +                       switch (kind) {
> +                       case BTF_KIND_STRUCT:
> +                               err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_STRUCT);
> +                               break;
> +                       case BTF_KIND_UNION:
> +                               err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_UNION);
> +                               break;
> +                       case BTF_KIND_ENUM:
> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
> +                               break;
> +                       case BTF_KIND_ENUM64:

nit: combine ENUM/ENUM64 cases?

> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
> +                               break;
> +                       default:
> +                               pr_warn("unexpected kind [%u] when creating distilled base BTF.\n",
> +                                       btf_kind(t));
> +                               return -EINVAL;
> +                       }
> +               } else {
> +                       err = btf_add_type(&dist->pipe, t);

So this should never happen if adding_to_base == true, is that right?
Can we check this? Or am I missing something as usual?

> +               }
> +               if (err < 0)
> +                       break;
> +               dist->ids[i] = id++;
> +       }
> +       return err;
> +}
> +

[...]

> +       n = btf__type_cnt(new_split);
> +       /* Now update base/split BTF ids. */
> +       for (i = 1; i < n; i++) {
> +               t = btf_type_by_id(new_split, i);
> +
> +               err = btf_type_visit_type_ids(t, btf_update_distilled_type_ids, &dist);
> +               if (err < 0)
> +                       goto err_out;
> +       }
> +       free(dist.ids);
> +       hashmap__free(dist.pipe.str_off_map);
> +       *new_base_btf = new_base;
> +       *new_split_btf = new_split;
> +       return 0;
> +err_out:
> +       free(dist.ids);
> +       if (!IS_ERR(dist.pipe.str_off_map))

you don't need to check this, hashmap__free() works for IS_ERR()
pointers as well

> +               hashmap__free(dist.pipe.str_off_map);
> +       btf__free(new_split);
> +       btf__free(new_base);
> +       return libbpf_err(err);
> +}

[...]