Re: [RFC bpf-next 1/8] btf: add kind metadata encoding to UAPI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:

> On Mon, Jun 5, 2023 at 3:38 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
>>
>> On Mon, Jun 5, 2023 at 9:15 AM Alexei Starovoitov
>> <alexei.starovoitov@xxxxxxxxx> wrote:
>> >
>> > On Fri, Jun 2, 2023 at 1:34 PM Andrii Nakryiko
>> > <andrii.nakryiko@xxxxxxxxx> wrote:
>> > >
>> > > >
>> > > > > > > >> +
>> > > > > > > >> +struct btf_kind_meta {
>> > > > > > > >> +    __u32 name_off;         /* kind name string offset */
>> > > > >
>> > > > > I'm not sure why we'd need to record this for every KIND? The tool
>> > > > > that doesn't know about this new kind can't do much about it anyways,
>> > > > > so whether it knows that this is "KIND_NEW_FANCY" or just its ID #123
>> > > > > doesn't make much difference?
>> > > >
>> > > > The name is certainly more meaningful than 123.
>> > > > bpftool output is consumed by humans who will be able to tell the difference.
>> > > > I'd keep the name here.
>> > >
>> > > Ok, it's fine. When I was originally proposing this compact metadata,
>> > > I was trying to make it as minimal as possible, so adding 80 bytes
>> > > just for string offset fields (plus a bunch of strings) felt like an
>> > > unnecessary overhead. But it's not a big deal.
>> >
>> > Exactly. It's just 4 * num_kinds bytes in and ~20 * num_kinds for
>> > names, but it implements 'self description'.
>> > Otherwise the names become an external knowledge and BTF is not self described.
>> >
>> >
>> > > >
>> > > > > > > > and would bump the BTF_VERSION to 2 to make it a 'milestone'.
>> > > > >
>> > > > > Bumping BTF_VERSION to 2 automatically makes BTF incompatible with all
>> > > > > existing kernels (and potentially many tools that parse BTF). Given we
>> > > > > can actually extend BTF in backwards compatible way by just adding an
>> > > > > optional two fields to btf_header + extra bytes for metadata sections,
>> > > > > why making our lives harder by bumping this version?
>> > > >
>> > > > I fail to see how bumping the version makes it harder.
>> > > > libbpf needs to sanitize meta* fields in the struct btf_header on
>> > > > older kernels anway. At the same time sanitizing the version from 2 to
>> > > > 1
>> > > > in the same header is one extra line of code in libbpf.
>> > > > What am I missing?
>> > >
>> > > So I checked libbpf code, and libbpf doesn't really check the version
>> > > field. So for the most part this BTF_VERSION bump wouldn't matter for
>> > > any tool that's based on libbpf's struct btf API. But if libbpf did
>> > > check version (as it probably should have), then by upgrading to newer
>> > > Clang that would emit BTF with this metadata (but no new fancy
>> > > BTF_KIND_KERNEL_FUNC or anything like that), we automatically make
>> > > such BTF incompatible with all those tools.
>> > >
>> > > Kernel is a bit different because it's extremely strict about BTF. I'm
>> > > more worried about tools like bpftool (but we don't check BTF_VERSION
>> > > there due to libbpf), llvm-objdump (when it supports BTF), etc.
>> > >
>> > > On the other hand, what do we gain by bumping this BTF_VERSION?
>> >
>> > The version bump will be an indication that
>> > v2 of BTF has enough info in the format for any tool/kernel to consume it.
>> > With v2 we should make BTF_KIND_META description mandatory.
>> > If we keep it as v1 then the presence of BTF_KIND_META would be
>> > an indication of 'self described' format.
>> > Which is also ok-ish, but seems less clean.
>> > zero vs not-zero of meta_off in btf_header is pretty much v1 vs v2.
>> >
>>
>> We had a long offline discussion w/ Alexei about this whole
>> self-describing BTF, and I will try to summarize it here a bit,
>> because I think we both think about "self-describing" differently, and
>> as a result few different aspects are conflated with each other (there
>> are at least 3(!) different things here).
>
> Thanks for summarizing. All correct.
>
>> From my perspective, this self-describing BTF metadata was purely
>> designed to allow tools without latest BTF knowledge to be able to
>> skip over unknown BTF_KIND_xxx, at most being able to tell whether
>> it's critical for understanding BTF (that's the OPTIONAL flag) or not.
>> I.e., with older bpftool (but the one that knows about btf_metadata,
>> of course), it would still be possible to `bpftool btf dump file
>> <file-with-newer-btf-kinds>` just fine, except for new KINDS (which
>> would be just emitted as "unknown BTF_KIND_XXX, skipping...".
>>
>> I think this problem is solved with this fixed + per-vlen sz and those
>> few extra flags.
>
> I'm fine with this approach as long as we don't fool ourselves that
> we're doing a "self described" format.
> We have a "size" field in btf_header. With this btf_metadata extension
> we're effectively adding "size" fields for each btf kind and its vlen part.
> So what Alan proposed:
> +struct btf_kind_meta {
> +       __u16 flags;            /* see BTF_KIND_META_* values above */
> +       __u8 info_sz;           /* size of singular element after btf_type */
> +       __u8 elem_sz;           /* size of each of btf_vlen(t) elements */
> +};
>
> _without_ name_off it makes the most sense.
>
> As soon as we're trying to add 'name_off' to the kind we're falling into
> the trap of thinking that we're adding "self described" format and
> btf_kind_meta needs to actually describe it for printing (with
> real name and not just integer id) and further trying to describe
> semantics of unknown kind with another flag that Andrii's proposed:
> "Another flag I was thinking about was a flag whether struct btf_type's
> type/size field is a type or a size (or something else)."
>
> imo name_off and that other flag in addition to optional_or_not flag
> are carrying the concept too far.
>
> We should just say upfront that this "struct btf_kind_meta" is to be
> able to extend BTF easier and nothing else.
> "old" bpftool will be able to skip unknown kinds, but dedup
> probably won't be able to skip much anyway.
>
> I'd also call it "struct btf_kind_description|layout|sizes"
> to narrow the scope.
> This BTF extension is not going to describe semantics of unknown kinds.
> Instead of "best effort" attempts with flags like "what type/size means"
> let's not even go there.
>
> If we go this simple route I'm fine with hard coded crc and base_crc
> fields. They probably should go to btf_header though.
> We don't need "struct btf_metadata" as well.
> It's making things sound beyond what it actually is.
> btf_header can point to an array of struct btf_kind_description.
> As simple as it can get.
> No need for json like format and key/value things either.
> We're not creating a self described BTF format.
> We're just adding a few size fields.
> The kernel/libbpf/dedup still needs to known semantics of future kinds
> to be able to print/operate on them.

I've only been following this discussion on the sidelines, but FWIW I
agree that it is futile to try to describe semantics of fields inside
the format. Anything that needs to do transformations on the whole of
the BTF is going to have to understand the semantics anyway. And a
pretty-printer can just skip over the fields it doesn't understand and
emit a "unknown type XXX" message when doing so.

I'll also add that I am thrilled with the effort to make sure new BTF
kinds always embed their length so parsers can skip over them; the fact
that the older ones don't is, IMO, one of the biggest flaws of the BTF
format, and I'm thrilled to see it fixed! The "type-length-value with a
'required' flag" is also a pretty standard way to do this in, e.g.,
network protocols.

As for bumping the version number, I don't think it's a good idea to
deliberately break compatibility this way unless it's absolutely
necessary. With "absolutely necessary" meaning "things will break in
subtle ways in any case, so it's better to make the breakage obvious".
But it libbpf is not checking the version field anyway, that becomes
kind of a moot point, as bumping it doesn't really gain us anything,
then...

-Toke





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux