Re: [RFC bpf-next 1/2] bpf: support standalone BTF in modules

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 22 Nov 2022 10:12:05 -0800

On Tue, Nov 22, 2022 at 9:36 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On 09/11/2022 22:43, Andrii Nakryiko wrote:
> > On Mon, Nov 7, 2022 at 8:37 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >>
> >> On 05/11/2022 22:54, Alexei Starovoitov wrote:
> >>> On Fri, Nov 4, 2022 at 8:58 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >>>>
> >>>> Not all kernel modules can be built in-tree when the core
> >>>> kernel is built. This presents a problem for split BTF, because
> >>>> split module BTF refers to type ids in the base kernel BTF, and
> >>>> if that base kernel BTF changes (even in minor ways) those
> >>>> references become invalid.  Such modules then cannot take
> >>>> advantage of BTF (or at least they only can until the kernel
> >>>> changes enough to invalidate their vmlinux type id references).
> >>>> This problem has been discussed before, and the initial approach
> >>>> was to allow BTF mismatch but fail to load BTF.  See [1]
> >>>> for more discussion.
> >>>>
> >>>> Generating standalone BTF for modules helps solve this problem
> >>>> because the BTF generated is self-referential only.  However,
> >>>> tooling is geared towards split BTF - for example bpftool assumes
> >>>> a module's BTF is defined relative to vmlinux BTF.  To handle
> >>>> this, dynamic remapping of standalone BTF is done on module
> >>>> load to make it appear like split BTF - type ids and string
> >>>> offsets are remapped such that they appear as they would in
> >>>> split BTF.  It just so happens that the BTF is self-referential.
> >>>> With this approach, existing tooling works with standalone
> >>>> module BTF from /sys/kernel/btf in the same way as before;
> >>>> no knowledge of split versus standalone BTF is required.
> >>>>
> >>>> Currently, the approach taken is to assume that the BTF
> >>>> associated with a module is split BTF.  If however the
> >>>> checking of types fails, we fall back to interpreting it as
> >>>> standalone BTF and carrying out remapping.  As discussed in [1]
> >>>> there are some heuristics we could use to identify standalone
> >>>> versus split module BTF, but for now the simplistic fallback
> >>>> method is used.
> >>>>
> >>>> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
> >>>>
> >>>> [1] https://lore.kernel.org/bpf/YfK18x%2FXrYL4Vw8o@syu-laptop/
> >>>> ---
> >>>>  kernel/bpf/btf.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 132 insertions(+)
> >>>>
> >>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>> index 5579ff3..5efdcaf 100644
> >>>> --- a/kernel/bpf/btf.c
> >>>> +++ b/kernel/bpf/btf.c
> >>>> @@ -5315,11 +5315,120 @@ struct btf *btf_parse_vmlinux(void)
> >>>>
> >>>>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> >>>>
> >>>> +static u32 btf_name_off_renumber(struct btf *btf, u32 name_off)
> >>>> +{
> >>>> +       return name_off + btf->start_str_off;
> >>>> +}
> >>>> +
> >>>> +static u32 btf_id_renumber(struct btf *btf, u32 id)
> >>>> +{
> >>>> +       /* no need to renumber void */
> >>>> +       if (id == 0)
> >>>> +               return id;
> >>>> +       return id + btf->start_id - 1;
> >>>> +}
> >>>> +
> >>>> +/* Renumber standalone BTF to appear as split BTF; name offsets must
> >>>> + * be relative to btf->start_str_offset and ids relative to btf->start_id.
> >>>> + * When user sees BTF it will appear as normal module split BTF, the only
> >>>> + * difference being it is fully self-referential and does not refer back
> >>>> + * to vmlinux BTF (aside from 0 "void" references).
> >>>> + */
> >>>> +static void btf_type_renumber(struct btf_verifier_env *env, struct btf_type *t)
> >>>> +{
> >>>> +       struct btf_var_secinfo *secinfo;
> >>>> +       struct btf *btf = env->btf;
> >>>> +       struct btf_member *member;
> >>>> +       struct btf_param *param;
> >>>> +       struct btf_array *array;
> >>>> +       struct btf_enum64 *e64;
> >>>> +       struct btf_enum *e;
> >>>> +       int i;
> >>>> +
> >>>> +       t->name_off = btf_name_off_renumber(btf, t->name_off);
> >>>> +
> >>>> +       switch (BTF_INFO_KIND(t->info)) {
> >>>> +       case BTF_KIND_INT:
> >>>> +       case BTF_KIND_FLOAT:
> >>>> +       case BTF_KIND_TYPE_TAG:
> >>>> +               /* nothing to renumber here, no type references */
> >>>> +               break;
> >>>> +       case BTF_KIND_PTR:
> >>>> +       case BTF_KIND_FWD:
> >>>> +       case BTF_KIND_TYPEDEF:
> >>>> +       case BTF_KIND_VOLATILE:
> >>>> +       case BTF_KIND_CONST:
> >>>> +       case BTF_KIND_RESTRICT:
> >>>> +       case BTF_KIND_FUNC:
> >>>> +       case BTF_KIND_VAR:
> >>>> +       case BTF_KIND_DECL_TAG:
> >>>> +               /* renumber the referenced type */
> >>>> +               t->type = btf_id_renumber(btf, t->type);
> >>>> +               break;
> >>>> +       case BTF_KIND_ARRAY:
> >>>> +               array = btf_array(t);
> >>>> +               array->type = btf_id_renumber(btf, array->type);
> >>>> +               array->index_type = btf_id_renumber(btf, array->index_type);
> >>>> +               break;
> >>>> +       case BTF_KIND_STRUCT:
> >>>> +       case BTF_KIND_UNION:
> >>>> +               member = (struct btf_member *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       member->type = btf_id_renumber(btf, member->type);
> >>>> +                       member->name_off = btf_name_off_renumber(btf, member->name_off);
> >>>> +                       member++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_FUNC_PROTO:
> >>>> +               param = (struct btf_param *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       param->type = btf_id_renumber(btf, param->type);
> >>>> +                       param->name_off = btf_name_off_renumber(btf, param->name_off);
> >>>> +                       param++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_DATASEC:
> >>>> +               secinfo = (struct btf_var_secinfo *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       secinfo->type = btf_id_renumber(btf, secinfo->type);
> >>>> +                       secinfo++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_ENUM:
> >>>> +               e = (struct btf_enum *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       e->name_off = btf_name_off_renumber(btf, e->name_off);
> >>>> +                       e++;
> >>>> +               }
> >>>> +               break;
> >>>> +       case BTF_KIND_ENUM64:
> >>>> +               e64 = (struct btf_enum64 *)(t + 1);
> >>>> +               for (i = 0; i < btf_type_vlen(t); i++) {
> >>>> +                       e64->name_off = btf_name_off_renumber(btf, e64->name_off);
> >>>> +                       e64++;
> >>>> +               }
> >>>> +               break;
> >>>> +       }
> >>>> +}
> >>>> +
> >>>> +static void btf_renumber(struct btf_verifier_env *env, struct btf *base_btf)
> >>>> +{
> >>>> +       struct btf *btf = env->btf;
> >>>> +       int i;
> >>>> +
> >>>> +       btf->start_id = base_btf->nr_types;
> >>>> +       btf->start_str_off = base_btf->hdr.str_len;
> >>>> +
> >>>> +       for (i = 0; i < btf->nr_types; i++)
> >>>> +               btf_type_renumber(env, btf->types[i]);
> >>>> +}
> >>>> +
> >>>>  static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
> >>>>  {
> >>>>         struct btf_verifier_env *env = NULL;
> >>>>         struct bpf_verifier_log *log;
> >>>>         struct btf *btf = NULL, *base_btf;
> >>>> +       bool standalone = false;
> >>>>         int err;
> >>>>
> >>>>         base_btf = bpf_get_btf_vmlinux();
> >>>> @@ -5367,9 +5476,32 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
> >>>>                 goto errout;
> >>>>
> >>>>         err = btf_check_all_metas(env);
> >>>> +       if (err) {
> >>>> +               /* BTF may be standalone; in that case meta checks will
> >>>> +                * fail and we fall back to standalone BTF processing.
> >>>> +                * Later on, once we have checked all metas, we will
> >>>> +                * retain start id from  base BTF so it will look like
> >>>> +                * split BTF (but is self-contained); renumbering is done
> >>>> +                * also to give the split BTF-like appearance and not
> >>>> +                * confuse pahole which assumes split BTF for modules.
> >>>> +                */
> >>>> +               btf->base_btf = NULL;
> >>>> +               if (btf->types)
> >>>> +                       kvfree(btf->types);
> >>>> +               btf->types = NULL;
> >>>> +               btf->types_size = 0;
> >>>> +               btf->start_id = 0;
> >>>> +               btf->nr_types = 0;
> >>>> +               btf->start_str_off = 0;
> >>>> +               standalone = true;
> >>>> +               err = btf_check_all_metas(env);
> >>>> +       }
> >>>
> >>> Interesting idea!
> >>> Instead of failing the first time, how about we make
> >>> such standalone module BTFs explicit?
> >>> Some flag or special type?
> >>> Then the kernel just checks that and renumbers right away.
> >>>
> >>
> >> I was thinking that might be one way to do it, perhaps even
> >> a .BTF_standalone section name or somesuch as a signal we
> >> are dealing with standalone BTF. However I _think_
> >> we can actually determine the module BTF is standalone
> >> without needing to change anything in the toolchain (I
> >> think adding flags would require that).
> >
> > Why not just extend btf_header to contains extra information where we
> > can record whether it is stand-alone or split, what's the checksum of
> > base BTF, etc, etc. Yes, we'll need to teach libbpf and some tools
> > about this v2 of btf_header, but it's also an opportunity to make BTF
> > a bit more self-describing. E.g., right now there is a pretty big
> > problem that when we add new BTF_KIND_XXX, no existing tooling will be
> > able to do anything with BTF that contains that new kind, even if that
> > kind is completely optional and uninteresting for most tools (e.g., if
> > some particular tool didn't care about DECL_TAG). So with v2 we can
> > record a small table that records each kind's size: extra info size
> > and per-element size (for types that have vlen>0).
> >
> > More upfront work, but solves few existing problems and we can reserve
> > space for future fields as well.
> >
> > WDYT?
> >
>
> Sorry Andrii, missed this reply. With respect to self-describing BTF,
> I've got a patch series that I was planning on sending out soon which
> approaches this by describing BTF kind encodings in BTF. The handy thing
> about that is as you say it allows us to parse BTF even if we don't
> actually use features, so when new kinds are added we can skip past
> them, but if a new libbpf comes along we can potentially unlock these
> features. This is particularly valuable for kernel/module BTF since
> the kernel might be around for a while and we would rather encode
> BTF optimistically. The other useful thing is it won't itself require
> any BTF format changes; it's simply a matter of adding a libbpf call
> to pahole which says "encode BTF kinds", and that part is done using
> basic BTF kinds like typedefs+structs. The benefit of doing this in
> BTF is that we don't need to worry about header incompatibility etc.
> Just adds a few hundred bytes to overall kernel BTF too, since kind
> encodings are only needed for vmlinux (or standalone) BTF. We also
> end up with BTF kind structures in vmlinux.h, which could potentially
> simplify BTF introspection in BPF programs in the future also.
>
> The handy thing about this approach is also that the kernel code
> that parses the BTF kind descriptions is easy to backport, so we
> could even look at backporting that patch to stable kernels such
> that they would be an a position to parse newer BTF kinds even
> if they could not use them. Because that parsing is based around
> interpreting existing BTF kinds, it is minimally invasive.
>
> I'll send it out as an RFC shortly to provide additional context.

I'm not clear how that handles kinds with multiple entities (like
fields for STRUCTs, args for FUNC_PROTO, etc), but I'll look at RFC.
I think a small table of kind ID -> prefix size + per-element size
mappings is useful for any tool that works with BTF, because it allows
you to skip unknown stuff. And it works not just for vmlinux BTF.

>
> Alan