On 17/05/2024 12:11, Arnaldo Carvalho de Melo wrote: > On Fri, May 17, 2024, 7:23 AM Alan Maguire <alan.maguire@xxxxxxxxxx > <mailto:alan.maguire@xxxxxxxxxx>> wrote: > > Split BPF Type Format (BTF) provides huge advantages in that kernel > modules only have to provide type information for types that they do not > share with the core kernel; for core kernel types, split BTF refers to > core kernel BTF type ids. So for a STRUCT sk_buff, a module that > uses that structure (or a pointer to it) simply needs to refer to the > core kernel type id, saving the need to define the structure and its > many > dependents. This cuts down on duplication and makes BTF as compact > as possible. > > However, there is a downside. This scheme requires the references from > split BTF to base BTF to be valid not just at encoding time, but at use > time (when the module is loaded). Even a small change in kernel types > can perturb the type ids in core kernel BTF, and due to pahole's > parallel processing of compilation units, even an unchanged kernel can > have different type ids if BTF is re-generated. > > > > I think it would be informative to mention the recently added > "reproducible_build" feature, i.e. rephrase to "... if the > reproducible_build isn't selected via --btf_features..." in the relevant > documentation. > Yeah, sorry this part should have been updated after the reproducible_build feature landed. > - Arnaldo > > Sent from smartphone, still on my way back home from LSF/MM+BPF > > So we have a robustness > problem for split BTF for cases where a module is not always compiled at > the same time as the kernel. This problem is particularly acute for > distros which generally want module builders to be able to compile a > module for the lifetime of a Linux stable-based release, and have it > continue to be valid over the lifetime of that release, even as changes > in data structures (and hence BTF types) accrue. Today it's not > possible to generate BTF for modules that works beyond the initial > kernel it is compiled against - kernel bugfixes etc invalidate the split > BTF references to vmlinux BTF, and BTF is no longer usable for the > module. > > The goal of this series is to provide options to provide additional > context for cases like this. That context comes in the form of > distilled base BTF; it stands in for the base BTF, and contains > information about the types referenced from split BTF, but not their > full descriptions. The modified split BTF will refer to type ids in > this .BTF.base section, and when the kernel loads such modules it > will use that .BTF.base to map references from split BTF to the > equivalent current vmlinux base BTF types. Once this relocation > process has succeeded, the module BTF available in /sys/kernel/btf > will look exactly as if it was built with the current vmlinux; > references to base types will be fixed up etc. > > A module builder - using this series along with the pahole changes - > can then build a module with distilled base BTF via an out-of-tree > module build, i.e. > > make -C . M=path/2/module > > The module will have a .BTF section (the split BTF) and a > .BTF.base section. The latter is small in size - distilled base > BTF does not need full struct/union/enum information for named > types for example. For 2667 modules built with distilled base BTF, > the average size observed was 1556 bytes (stddev 1563). The overall > size added to this 2667 modules was 5.3Mb. > > Note that for the in-tree modules, this approach is not needed as > split and base BTF in the case of in-tree modules are always built > and re-built together. > > The series first focuses on generating split BTF with distilled base > BTF, and provides btf__parse_opts() which allows specification > of the section name from which to read BTF data, since we now have > both .BTF and .BTF.base sections that can contain such data. > > Then we add support to resolve_btfids for generating the .BTF.ids > section with reference to the .BTF.base section - this ensures the > .BTF.ids match those used in the split/base BTF. > > Finally the series provides the mechanism for relocating split BTF with > a new base; the distilled base BTF is used to map the references to base > BTF in the split BTF to the new base. For the kernel, this relocation > process happens at module load time, and we relocate split BTF > references to point at types in the current vmlinux BTF. As part of > this, .BTF.ids references need to be mapped also. > > So concretely, what happens is > > - we generate split BTF in the .BTF section of a module that refers to > types in the .BTF.base section as base types; the latter are not full > type descriptions but provide information about the base type. So > a STRUCT sk_buff would be represented as a FWD struct sk_buff in > distilled base BTF for example. > - when the module is loaded, the split BTF is relocated with vmlinux > BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff > in vmlinux BTF and map all split BTF references to the distilled base > FWD sk_buff, replacing them with references to the vmlinux BTF > STRUCT sk_buff. > > Support is also added to bpftool to be able to display split BTF > relative to its .BTF.base section, and also to display the relocated > form via the "-R path_to_base_btf". > > A previous approach to this problem [1] utilized standalone BTF for such > cases - where the BTF is not defined relative to base BTF so there is no > relocation required. The problem with that approach is that from > the verifier perspective, some types are special, and having a custom > representation of a core kernel type that did not necessarily match the > current representation is not tenable. So the approach taken here was > to preserve the split BTF model while minimizing the representation of > the context needed to relocate split and current vmlinux BTF. > > To generate distilled .BTF.base sections the associated dwarves > patch (to be applied on the "next" branch there) is needed. > Without it, things will still work but modules will not be built > with a .BTF.base section. > > Changes since v3[3]: > > - distill now checks for duplicate-named struct/unions and records > them as a sized struct/union to help identify which of the > multiple base BTF structs/unions it refers to (Eduard, patch 1) > - added test support for multiple name handling (Eduard, patch 2) > - simplified the string mapping when updating split BTF to use > base BTF instead of distilled base. Since the only string > references split BTF can make to base BTF are the names of > the base types, create a string map from distilled string > offset -> base BTF string offset and update string offsets > by visiting all strings in split BTF; this saves having to > do costly searches of base BTF (Eduard, patch 7,10) > - fixed bpftool manpage and indentation issues (Quentin, patch 11) > > Also explored Eduard's suggestion of doing an implicit fallback > to checking for .BTF.base section in btf__parse() when it is > called to get base BTF. However while it is doable, it turned > out to be difficult operationally. Since fallback is implicit > we do not know the source of the BTF - was it from .BTF or > .BTF.base? In bpftool, we want to try first standalone BTF, > then split, then split with distilled base. Having a way > to explicitly request .BTF.base via btf__parse_opts() fits > that model better. > > Changes since v2[4]: > > - submitted patch to use --btf_features in Makefile.btf for pahole > v1.26 and later separately (Andrii). That has landed in bpf-next > now. > - distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating > the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii) > - moved to distilling only named types, augmenting split BTF with > associated reference types; this simplifies greatly the distilled > base BTF and the mapping operation between distilled and base > BTF when relocating (most of the series changes, Andrii) > - relocation now iterates over base BTF, looking for matches based > on name in distilled BTF. Distilled BTF is pre-sorted by name > (Andrii, patch 8) > - removed most redundant compabitiliby checks aside from struct > size for base types/embedded structs and kind compatibility > (since we only match on name) (Andrii, patch 8) > - btf__parse_opts() now replaces btf_parse() internally in libbpf > (Eduard, patch 3) > > Changes since RFC [5]: > > - updated terminology; we replace clunky "base reference" BTF with > distilling base BTF into a .BTF.base section. Similarly BTF > reconcilation becomes BTF relocation (Andrii, most patches) > - add distilled base BTF by default for out-of-tree modules > (Alexei, patch 8) > - distill algorithm updated to record size of embedded struct/union > by recording it as a 0-vlen STRUCT/UNION with size preserved > (Andrii, patch 2) > - verify size match on relocation for such STRUCT/UNIONs (Andrii, > patch 9) > - with embedded STRUCT/UNION recording size, we can have bpftool > dump a header representation using .BTF.base + .BTF sections > rather than special-casing and refusing to use "format c" for > that case (patch 5) > - match enum with enum64 and vice versa (Andrii, patch 9) > - ensure that resolve_btfids works with BTF without .BTF.base > section (patch 7) > - update tests to cover embedded types, arrays and function > prototypes (patches 3, 12) > > [1] > https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@xxxxxxxxxx/ <https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@xxxxxxxxxx/> > [2] > https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@xxxxxxxxxx/ <https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@xxxxxxxxxx/> > [3] > https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@xxxxxxxxxx/ <https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@xxxxxxxxxx/> > [4] > https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@xxxxxxxxxx/ <https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@xxxxxxxxxx/> > [5] > https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@xxxxxxxxxx/ <https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@xxxxxxxxxx/> > > Alan Maguire (11): > libbpf: add btf__distill_base() creating split BTF with distilled base > BTF > selftests/bpf: test distilled base, split BTF generation > libbpf: add btf__parse_opts() API for flexible BTF parsing > bpftool: support displaying raw split BTF using base BTF section as > base > resolve_btfids: use .BTF.base ELF section as base BTF if -B option is > used > kbuild, bpf: add module-specific pahole/resolve_btfids flags for > distilled base BTF > libbpf: split BTF relocation > selftests/bpf: extend distilled BTF tests to cover BTF relocation > module, bpf: store BTF base pointer in struct module > libbpf,bpf: share BTF relocate-related code with kernel > bpftool: support displaying relocated-with-base split BTF > > include/linux/btf.h | 45 ++ > include/linux/module.h | 2 + > kernel/bpf/Makefile | 8 + > kernel/bpf/btf.c | 166 +++-- > kernel/module/main.c | 5 +- > scripts/Makefile.btf | 7 + > scripts/Makefile.modfinal | 4 +- > .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- > tools/bpf/bpftool/bash-completion/bpftool | 7 +- > tools/bpf/bpftool/btf.c | 19 +- > tools/bpf/bpftool/main.c | 14 +- > tools/bpf/bpftool/main.h | 2 + > tools/bpf/resolve_btfids/main.c | 28 +- > tools/lib/bpf/Build | 2 +- > tools/lib/bpf/btf.c | 605 +++++++++++++----- > tools/lib/bpf/btf.h | 59 ++ > tools/lib/bpf/btf_common.c | 143 +++++ > tools/lib/bpf/btf_relocate.c | 341 ++++++++++ > tools/lib/bpf/libbpf.map | 3 + > tools/lib/bpf/libbpf_internal.h | 3 + > .../selftests/bpf/prog_tests/btf_distill.c | 346 ++++++++++ > 21 files changed, 1612 insertions(+), 212 deletions(-) > create mode 100644 tools/lib/bpf/btf_common.c > create mode 100644 tools/lib/bpf/btf_relocate.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c > > -- > 2.31.1 > >