[PATCH dwarves 0/7] Add support for generating BTF for all variables

Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> · Fri, 26 Aug 2022 11:49:04 -0700

Hello everyone,

BTF offers some exciting new possibilities beyond its original intent;
one of these is making the kernel more self-describing for debug tools.
Kallsyms contains symbol table data, and ORC (for x86_64) contains
information to help unwind stacks. Now, BTF can provide type information
for functions and variables. Taken together, this data is enough to
power the basic (read-only) functions of a postmortem or live debugger,
without falling back on the heavier debugging information formats like
DWARF. What's more, all of these data sources are contained within the
kernel image itself, and are thus available on live systems and within
crash dumps, without consulting any external debug information files.

However, currently BTF generation emits information only for percpu
variables. This patch series removes that limitation, allowing
generating BTF for all variables, thus providing complete type
information for debuggers.

Of course, generating additional BTF means that more data must be stored
in the kernel image, and that may not be okay for everyone. Thus, the
new behavior must be explicitly enabled by a flag.

Testing
-------

To verify this change and illustrate the additional space required, I
built v5.19-rc7 on x86_defconfig, with the following additionally
enabled:

enable DEBUG_INFO_DWARF4
enable BPF_SYSCALL
enable DEBUG_INFO_BTF

I then ran pahole to generate BTF from the built vmlinux in three
configurations, and recorded the size of the BTF for each:

1) using the current master branch
   size: 5505315 bytes
2) using this patched version, without enabling --encode_all_btf_vars
   size: 5505315 bytes
3) using this patched version, with --encode_all_btf_vars enabled
   size: 6811291 bytes

A total increase of 1.25 MiB, or a 23.7% increase. This is definitely
notable, but not unreasonable for many use cases such as desktop or
server applications. I also verified that the data generated by cases 1
and 2 are byte-for-byte identical: that is, there are no changes to the
generated BTF unless --encode_all_btf_vars is enabled.

I also verified that the output variables makes sense. I created an
application which parses the output BTF and dumps the
declarations (BTF_KIND_VAR and BTF_KIND_FUNC), and then diffed its
output between configuration 2 and 3. I'm happy to provide a link to
that diff (it's of course too big to include in the email).

End-to-end test
---------------

To show this is not just theory, I've created an end-to-end test which
combines BTF generated via this patch series, along with a kernel patch
necessary to expose the kallsyms data [1], and a branch of the drgn
debugger[2] which implements kallsyms and BTF parsing. Core dumps
generated on the resulting kernel can be loaded by the drgn debugger,
and the it can read out variables from the dump with full type
information without needing to consult a DWARF debuginfo file.

Future Work
-----------

If this proves acceptable, I'd like to follow-up with a kernel patch to
add a configuration option (default=n) for generating BTF with all
variables, which distributions could choose to enable or not.

There was previous discussion[3] about leveraging split BTF or building
additional kernel modules to contain the extra variables. I believe with
this patch series, it is possible to do that. However, I'd argue that
simpler is better here: the advantage for using BTF is having it all
available in the kernel/module image. Storing extra BTF on the
filesystem would break that advantage, and at that point, you'd be
better off using a debuginfo format like CTF, which is lightweight and
expected to be found on the filesystem.

[1]: https://lore.kernel.org/lkml/20220517000508.777145-3-stephen.s.brennan@xxxxxxxxxx/T/
     (The above series is already in the 6.0 RC's)
[2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
[3]: https://lore.kernel.org/bpf/586a6288-704a-f7a7-b256-e18a675927df@xxxxxxxxxx/

Stephen Brennan (7):
  dutil: return ELF section name when looked up by index
  btf_encoder: Rename percpu structures to variables
  btf_encoder: cache all ELF section info
  btf_encoder: make the variable array dynamic
  btf_encoder: record ELF section for collected variables
  btf_encoder: collect all variables
  btf_encoder: allow encoding all variables

 btf_encoder.c      | 196 +++++++++++++++++++++++++++------------------
 btf_encoder.h      |   8 +-
 dutil.c            |  10 ++-
 dutil.h            |   2 +-
 man-pages/pahole.1 |   6 +-
 pahole.c           |  31 +++++--
 6 files changed, 165 insertions(+), 88 deletions(-)

-- 
2.34.1