[RFC bpf-next 0/5] bpf: making BTF self-describing

Alan Maguire <alan.maguire@xxxxxxxxxx> · Wed, 23 Nov 2022 17:41:47 +0000

One problem with the BPF Type Format (BTF) is that it is hard
to extend.  BTF consists of a set of kinds, each representing
an aspect of type or variable information such as an integer,
struct, array and so on.  The problem is that at the time BTF
is encoded, we do not provide information about the kinds we
have used, so when the encoded BTF is later parsed, the tools
that parse it must know about all the kinds used at encoding
time in order to parse the BTF.  If an unknown kind is found,
we have no way of knowing what size it is, so have to give
up parsing since we cannot skip past it due to the unknown
size.

So if BTF is created with a newer toolchain which has a new
kind in it, but later parsed with an older toolchain, it
is unparseable.  Ideally we would like such BTF to be
capable of parsing, so we need a mechanism to encode info
about the kinds used at encoding time that is then easily
accessible to parsing operations.  The alternative is
the current situation, where encoding has to be pessimistic
and we have to skip various kind encodings to avoid parsing
failures.

Here we propose a scheme to encode kind information such
that parsing can proceed.  The following steps are
involved:

1. a libbpf function is introduced btf__add_kinds() which
   adds kind information
2. that kind information is encoded in BTF as a set of
   structures representing the kind encodings
3. tools will call btf__add_kinds() at BTF encoding time
   to add this kind encoding information
4. at parsing time, if an unrecognized kind is found, the
   kind encoding is used to determine the size of the
   kind representation and parsing proceeds

Steps 1 and 2 are accomplished in patches 1 and 2.
Patches 3 and 4 tackle step 4 for userspace and kernel.
Finally patch 5 tests BTF kind encoding and decoding.

To support BTF kind encoding for kernel BTF, pahole
would have to be updated to call btf__add_kinds(). 
[1] and [2] can be used to try this out.

More details are provided in the individual patches.

One potential application of this approach would be a
stable backport of patches 1 and 3; this would allow
older kernels to use latest pahole without adding
additional "skip" directives when new kinds are
added.

So assuming something like this landed, how would it
effect adding a new kind?  Once that kind was available
in the libbpf that dwarves uses, it would mean that
BTF would contain instances of that new kind.  However
if an older libbpf (that had support for parsing kind
descriptions) encountered it, parsing would still work;
the new information encoded would not be available
however.

So the result would be that a new kind would be able
to be added without breaking BTF parsing.

[1] https://github.com/alan-maguire/dwarves/tree/btf-kind-encoding
[2] https://github.com/alan-maguire/libbpf/tree/btf-kind-encoding

Alan Maguire (5):
  bpf: add kind/metadata prefixes to uapi/linux/btf.h
  libbpf: provide libbpf API to encode BTF kind information
  libbpf: use BTF-encoded kind information to help parse unrecognized
    kinds
  bpf: parse unrecognized kind info using encoded kind information (if
    present)
  selftests/bpf: test kind encoding/decoding

 include/uapi/linux/btf.h                          |   7 +
 kernel/bpf/btf.c                                  |  87 +++++-
 tools/include/uapi/linux/btf.h                    |   7 +
 tools/lib/bpf/btf.c                               | 357 ++++++++++++++++++++++
 tools/lib/bpf/btf.h                               |  10 +
 tools/lib/bpf/libbpf.map                          |   1 +
 tools/testing/selftests/bpf/prog_tests/btf_kind.c | 234 ++++++++++++++
 7 files changed, 696 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_kind.c

-- 
1.8.3.1