Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 6 Jun 2023 09:50:42 -0700

On Mon, Jun 5, 2023 at 7:46 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
> > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > >
> > > > > > > > >> +
> > > > > > > > >> +struct btf_kind_meta {
> > > > > > > > >> +    __u32 name_off;         /* kind name string offset */
> > > > > >
> > > > > > I'm not sure why we'd need to record this for every KIND? The tool
> > > > > > that doesn't know about this new kind can't do much about it anyways,
> > > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
> > > > > > doesn't make much difference?
> > > > >
> > > > > The name is certainly more meaningful than 123.
> > > > > bpftool output is consumed by humans who will be able to tell the difference.
> > > > > I'd keep the name here.
> > > >
> > > > Ok, it's fine. When I was originally proposing this compact metadata,
> > > > I was trying to make it as minimal as possible, so adding 80 bytes
> > > > just for string offset fields (plus a bunch of strings) felt like an
> > > > unnecessary overhead. But it's not a big deal.
> > >
> > > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
> > > names, but it implements 'self description'.
> > > Otherwise the names become an external knowledge and BTF is not self described.
> > >
> > >
> > > > >
> > > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
> > > > > >
> > > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
> > > > > > existing kernels (and potentially many tools that parse BTF). Given we
> > > > > > can actually extend BTF in backwards compatible way by just adding an
> > > > > > optional two fields to btf_header + extra bytes for metadata sections,
> > > > > > why making our lives harder by bumping this version?
> > > > >
> > > > > I fail to see how bumping the version makes it harder.
> > > > > libbpf needs to sanitize meta* fields in the struct btf_header on
> > > > > older kernels anway. At the same time sanitizing the version from 2 to
> > > > > 1
> > > > > in the same header is one extra line of code in libbpf.
> > > > > What am I missing?
> > > >
> > > > So I checked libbpf code, and libbpf doesn't really check the version
> > > > field. So for the most part this BTF_VERSION bump wouldn't matter for
> > > > any tool that's based on libbpf's struct btf API. But if libbpf did
> > > > check version (as it probably should have), then by upgrading to newer
> > > > Clang that would emit BTF with this metadata (but no new fancy
> > > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
> > > > such BTF incompatible with all those tools.
> > > >
> > > > Kernel is a bit different because it's extremely strict about BTF. I'm
> > > > more worried about tools like bpftool (but we don't check BTF_VERSION
> > > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
> > > >
> > > > On the other hand, what do we gain by bumping this BTF_VERSION?
> > >
> > > The version bump will be an indication that
> > > v2 of BTF has enough info in the format for any tool/kernel to consume it.
> > > With v2 we should make BTF_KIND_META description mandatory.
> > > If we keep it as v1 then the presence of BTF_KIND_META would be
> > > an indication of 'self described' format.
> > > Which is also ok-ish, but seems less clean.
> > > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
> > >
> >
> > We had a long offline discussion w/ Alexei about this whole
> > self-describing BTF, and I will try to summarize it here a bit,
> > because I think we both think about "self-describing" differently, and
> > as a result few different aspects are conflated with each other (there
> > are at least 3(!) different things here).
>
> Thanks for summarizing. All correct.
>
> > From my perspective, this self-describing BTF metadata was purely
> > designed to allow tools without latest BTF knowledge to be able to
> > skip over unknown BTF_KIND_xxx, at most being able to tell whether
> > it's critical for understanding BTF (that's the OPTIONAL flag) or not.
> > I.e., with older bpftool (but the one that knows about btf_metadata,
> > of course), it would still be possible to `bpftool btf dump file
> > <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
> > would be just emitted as "unknown BTF_KIND_XXX, skipping...".
> >
> > I think this problem is solved with this fixed + per-vlen sz and those
> > few extra flags.
>
> I'm fine with this approach as long as we don't fool ourselves that
> we're doing a "self described" format.
> We have a "size" field in btf_header. With this btf_metadata extension
> we're effectively adding "size" fields for each btf kind and its vlen part.
> So what Alan proposed:
> +struct btf_kind_meta {
> +       __u16 flags;            /* see BTF_KIND_META_* values above */
> +       __u8 info_sz;           /* size of singular element after btf_type */
> +       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> +};
>
> _without_ name_off it makes the most sense.

Yep. I didn't see much need for name_off as well.

>
> As soon as we're trying to add 'name_off' to the kind we're falling into
> the trap of thinking that we're adding "self described" format and
> btf_kind_meta needs to actually describe it for printing (with
> real name and not just integer id) and further trying to describe
> semantics of unknown kind with another flag that Andrii's proposed:
> "Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else)."
>
> imo name_off and that other flag in addition to optional_or_not flag
> are carrying the concept too far.
>
> We should just say upfront that this "struct btf_kind_meta" is to be
> able to extend BTF easier and nothing else.

Yep, that was precisely (at least my) intent. It's great that we are
converging on this.

> "old" bpftool will be able to skip unknown kinds, but dedup
> probably won't be able to skip much anyway.

Agreed, I don't think we can ever make BTF dedup work reliably with
KINDs it doesn't understand. I wouldn't even try. I'd also say that
kernel should keep being strict about this (even if we add
"is-it-optional" field, kernel can't trust it). Libbpf and other
libraries will have to keep sanitizing BTF anyways.

I'm also ok with dropping "optional_or_not" flag. Same for
"does-it-point-to-type" flag. I can see some use for the latter (e.g.,
still being able to calculate sizes and stuff), but it's nothing super
critical, IMO.

>
> I'd also call it "struct btf_kind_description|layout|sizes"

I like btf_kind_layout, it's short and to the point.

> to narrow the scope.
> This BTF extension is not going to describe semantics of unknown kinds.
> Instead of "best effort" attempts with flags like "what type/size means"
> let's not even go there.
>

Ack.

> If we go this simple route I'm fine with hard coded crc and base_crc
> fields. They probably should go to btf_header though.

Yep, on btf_header fields. But I'd not hardcode "crc" name. If we are
doing them as strings (which I think we should instead of dooming them
to 32-bit integer crc32 value only), then can we just say generically
that it's either "id" or "checksum"?

But I guess crc32 would be fine in practice as well. So not something
I strongly feel about.

> We don't need "struct btf_metadata" as well.
> It's making things sound beyond what it actually is.
> btf_header can point to an array of struct btf_kind_description.
> As simple as it can get.

Agreed. Still, it's a third section, and we should at least have a
count of those btf_kind_layout items somewhere.

> No need for json like format and key/value things either.
> We're not creating a self described BTF format.
> We're just adding a few size fields.

Ack, great.

> The kernel/libbpf/dedup still needs to known semantics of future kinds
> to be able to print/operate on them.

Yes.