Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 2 Jun 2023 11:11:55 -0700

On Fri, Jun 2, 2023 at 9:32 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Jun 1, 2023 at 9:54 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Thu, Jun 1, 2023 at 3:38 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> > >
> > > On 01/06/2023 04:53, Alexei Starovoitov wrote:
> > > > On Wed, May 31, 2023 at 09:19:28PM +0100, Alan Maguire wrote:
> > > >> BTF kind metadata provides information to parse BTF kinds.
> > > >> By separating parsing BTF from using all the information
> > > >> it provides, we allow BTF to encode new features even if
> > > >> they cannot be used.  This is helpful in particular for
> > > >> cases where newer tools for BTF generation run on an
> > > >> older kernel; BTF kinds may be present that the kernel
> > > >> cannot yet use, but at least it can parse the BTF
> > > >> provided.  Meanwhile userspace tools with newer libbpf
> > > >> may be able to use the newer information.
> > > >>
> > > >> The intent is to support encoding of kind metadata
> > > >> optionally so that tools like pahole can add this
> > > >> information.  So for each kind we record
> > > >>
> > > >> - a kind name string
> > > >> - kind-related flags
> > > >> - length of singular element following struct btf_type
> > > >> - length of each of the btf_vlen() elements following
> > > >>
> > > >> In addition we make space in the metadata for
> > > >> CRC32s computed over the BTF along with a CRC for
> > > >> the base BTF; this allows split BTF to identify
> > > >> a mismatch explicitly.  Finally we provide an
> > > >> offset for an optional description string.
> > > >>
> > > >> The ideas here were discussed at [1] hence
> > > >>
> > > >> Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > >> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
> > > >>
> > > >> [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@xxxxxxxxxxxxxx/
> > > >> ---
> > > >>  include/uapi/linux/btf.h       | 29 +++++++++++++++++++++++++++++
> > > >>  tools/include/uapi/linux/btf.h | 29 +++++++++++++++++++++++++++++
> > > >>  2 files changed, 58 insertions(+)
> > > >>
> > > >> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> > > >> index ec1798b6d3ff..94c1f4518249 100644
> > > >> --- a/include/uapi/linux/btf.h
> > > >> +++ b/include/uapi/linux/btf.h
> > > >> @@ -8,6 +8,34 @@
> > > >>  #define BTF_MAGIC   0xeB9F
> > > >>  #define BTF_VERSION 1
> > > >>
> > > >> +/* is this information required? If so it cannot be sanitized safely. */
> > > >> +#define BTF_KIND_META_OPTIONAL              (1 << 0)
>
> Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else). E.g., let's
> say we haven't had btf_type_tag yet and were adding it after we had
> this new metadata. We could say that type_tag's type/size field is
> actually a type ID, and generic tools like bpftool could basically
> skip type_tag and resolve to underlying type. This way, optional
> modifier/decorator KINDs won't even have to break applications using
> old libbpf's when it comes to calculating type sizes and resolving
> them.

+1

> > > >> +
> > > >> +struct btf_kind_meta {
> > > >> +    __u32 name_off;         /* kind name string offset */
>
> I'm not sure why we'd need to record this for every KIND? The tool
> that doesn't know about this new kind can't do much about it anyways,
> so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> doesn't make much difference?

The name is certainly more meaningful than 123.
bpftool output is consumed by humans who will be able to tell the difference.
I'd keep the name here.

> > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
>
> Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> existing kernels (and potentially many tools that parse BTF). Given we
> can actually extend BTF in backwards compatible way by just adding an
> optional two fields to btf_header + extra bytes for metadata sections,
> why making our lives harder by bumping this version?

I fail to see how bumping the version makes it harder.
libbpf needs to sanitize meta* fields in the struct btf_header on
older kernels anway. At the same time sanitizing the version from 2 to
1
in the same header is one extra line of code in libbpf.
What am I missing?

>
> > > > v2 -> self described.
> > >
> > > sure, sounds good. One other change perhaps worth making; currently
> > > we assume that the kind metadata is at the end of the struct
> > > btf_metadata, but if we ever wanted to add metadata fields in the
> > > future, we'd want so support both the current metadata structure and
> > > any future structure which had additional fields.
>
> see above, another reason to make metadata a separate section, in
> addition to types and strings
>
> > >
> > > With that in mind, it might make sense to go with something like
> > >
> > > struct btf_metadata {
> > >         __u32   kind_meta_cnt;
> > >         __u32   kind_meta_offset;       /* kind_meta_cnt instances of struct
> > > btf_kind_meta start here */
> > >         __u32   flags;
> > >         __u32   description_off;        /* optional description string*/
> > >         __u32   crc;                    /* crc32 of BTF */
> > >         __u32   base_crc;               /* crc32 of base BTF */
> > > };
> > >
> > > For the original version, kind_meta_offset would just be
> > > at meta_off + sizeof(struct btf_metadata), but if we had multiple
> > > versions of the btf_metadata header to handle, they could all rely on
> > > the kind_meta_offset being where kind metadata is stored.
> > > For validation we'd have to make sure kind_meta_offset was within
> > > the the metadata header range.
> >
> > kind_meta_offset is an ok idea, but I don't quite see why we'd have
> > multiple 'struct btf_metadata' pointing to the same set of 'struct
> > btf_kind_meta'.
> >
> > Also why do we need description_off ? Shouldn't string go into
> > btf_header->str_off ?
> >
> > > >
> > > >> +    __u32   flags;
> > > >> +    __u32   description_off;        /* optional description string */
> > > >> +    __u32   crc;                    /* crc32 of BTF */
> > > >> +    __u32   base_crc;               /* crc32 of base BTF */
> > > >
> > > > Hard coded CRC also gives me a pause.
> > > > Should it be an optional KIND like btf tags?
> > >
> > > The goal of the CRC is really just to provide a unique identifier that
> > > we can use for things like checking if there's a mismatch between
> > > base and module BTF. If we want to ever do CRC validation (not sure
> > > if there's a case for that) we probably need to think about cases like
> > > BTF sanitization of BPF program BTF; this would likely only be an
> > > issue if metadata support is added to BPF compilers.
> > >
> > > The problem with adding it via a kind is that if we first compute
> > > the CRC over the entire BTF object and then add the kind, the addition
> > > of the kind breaks the CRC; as a result I _think_ we're stuck with
> > > having to have it in the header.
> >
> > Hmm. libbpf can add BTF_KIND_CRC with zero-ed u32 crc field
> > and later fill it in.
> > It's really not different than u32 crc field inside 'struct btf_metadata'.
> >
> > > That said I don't think CRC is necessarily the only identifier
> > > we could use, and we don't even need to identify it as a
> > > CRC in the UAPI, just as a "unique identifier"; that would deal
> > > with issues about breaking the CRC during sanitization. All
> > > depends on whether we ever see a need to verify BTF via CRC
> > > really.
> >
> > Right. It could be sha or anything else, but user space and kernel
> > need to agree on the math to compute it, so something got to indicate
> > that this 32-bit is a crc.
> > Hence KIND_CRC, KIND_SHA fit better.
>
> what if instead of crc and base_src fields, we have
>
> __u32 id_str_off;
> __u32 base_id_str_off;
>
> and they are offsets into a string section. We can then define that
> those strings have to be something like "crc:<crc-value>" or
> "sha:<sha-value". This will be a generic ID, and extensible (and more
> easily extensible, probably), but won't require new KIND.

Encoding binary data in strings with \0 and other escape chars?
Ouch. Please no.
We can have variable size KIND_ID and encode crc vs sha in flags,
but binary data better stay binary.

> This also has a good property that 0 means "no ID", which helps with
> the base BTF case. Current "__u32 crc;" doesn't have this property and
> requires a flag.

imo this crc addition is a litmus test for this self-described format.
If we cannot encode it as a new KIND* it means this self-described
idea is broken.