On Mon, Apr 29, 2024 at 10:31 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 29/04/2024 18:05, Andrii Nakryiko wrote: > > On Mon, Apr 29, 2024 at 8:25 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >> > >> On 27/04/2024 01:24, Andrii Nakryiko wrote: > >>> On Fri, Apr 26, 2024 at 3:56 PM Andrii Nakryiko > >>> <andrii.nakryiko@xxxxxxxxx> wrote: > >>>> > >>>> On Wed, Apr 24, 2024 at 8:48 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >>>>> > >>>>> Split BPF Type Format (BTF) provides huge advantages in that kernel > >>>>> modules only have to provide type information for types that they do not > >>>>> share with the core kernel; for core kernel types, split BTF refers to > >>>>> core kernel BTF type ids. So for a STRUCT sk_buff, a module that > >>>>> uses that structure (or a pointer to it) simply needs to refer to the > >>>>> core kernel type id, saving the need to define the structure and its many > >>>>> dependents. This cuts down on duplication and makes BTF as compact > >>>>> as possible. > >>>>> > >>>>> However, there is a downside. This scheme requires the references from > >>>>> split BTF to base BTF to be valid not just at encoding time, but at use > >>>>> time (when the module is loaded). Even a small change in kernel types > >>>>> can perturb the type ids in core kernel BTF, and due to pahole's > >>>>> parallel processing of compilation units, even an unchanged kernel can > >>>>> have different type ids if BTF is re-generated. So we have a robustness > >>>>> problem for split BTF for cases where a module is not always compiled at > >>>>> the same time as the kernel. This problem is particularly acute for > >>>>> distros which generally want module builders to be able to compile a > >>>>> module for the lifetime of a Linux stable-based release, and have it > >>>>> continue to be valid over the lifetime of that release, even as changes > >>>>> in data structures (and hence BTF types) accrue. Today it's not > >>>>> possible to generate BTF for modules that works beyond the initial > >>>>> kernel it is compiled against - kernel bugfixes etc invalidate the split > >>>>> BTF references to vmlinux BTF, and BTF is no longer usable for the > >>>>> module. > >>>>> > >>>>> The goal of this series is to provide options to provide additional > >>>>> context for cases like this. That context comes in the form of > >>>>> distilled base BTF; it stands in for the base BTF, and contains > >>>>> information about the types referenced from split BTF, but not their > >>>>> full descriptions. The modified split BTF will refer to type ids in > >>>>> this .BTF.base section, and when the kernel loads such modules it > >>>>> will use that base BTF to map references from split BTF to the > >>>>> current vmlinux BTF - a process of relocating split BTF with the > >>>>> currently-running kernel's vmlinux base BTF. > >>>>> > >>>>> A module builder - using this series along with the pahole changes - > >>>>> can then build a module with distilled base BTF via an out-of-tree > >>>>> module build, i.e. > >>>>> > >>>>> make -C . M=path/2/module > >>>>> > >>>>> The module will have a .BTF section (the split BTF) and a > >>>>> .BTF.base section. The latter is small in size - distilled base > >>>>> BTF does not need full struct/union/enum information for named > >>>>> types for example. For 2667 modules built with distilled base BTF, > >>>>> the average size observed was 1556 bytes (stddev 1563). > >>>>> > >>>>> Note that for the in-tree modules, this approach is not needed as > >>>>> split and base BTF in the case of in-tree modules are always built > >>>>> and re-built together. > >>>>> > >>>>> The series first focuses on generating split BTF with distilled base > >>>>> BTF, and provides btf__parse_opts() which allows specification > >>>>> of the section name from which to read BTF data, since we now have > >>>>> both .BTF and .BTF.base sections that can contain such data. > >>>>> > >>>>> Then we add support to resolve_btfids for generating the .BTF.ids > >>>>> section with reference to the .BTF.base section - this ensures the > >>>>> .BTF.ids match those used in the split/base BTF. > >>>>> > >>>>> Finally the series provides the mechanism for relocating split BTF with > >>>>> a new base; the distilled base BTF is used to map the references to base > >>>>> BTF in the split BTF to the new base. For the kernel, this relocation > >>>>> process happens at module load time, and we relocate split BTF > >>>>> references to point at types in the current vmlinux BTF. As part of > >>>>> this, .BTF.ids references need to be mapped also. > >>>>> > >>>>> So concretely, what happens is > >>>>> > >>>>> - we generate split BTF in the .BTF section of a module that refers to > >>>>> types in the .BTF.base section as base types; these are not full > >>>>> type descriptions but provide information about the base type. So > >>>>> a STRUCT sk_buff would be represented as a FWD struct sk_buff in > >>>>> distilled base BTF for example. > >>>>> - when the module is loaded, the split BTF is relocated with vmlinux > >>>>> BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff > >>>>> in vmlinux BTF and map all split BTF references to the distilled base > >>>>> FWD sk_buff, replacing them with references to the vmlinux BTF > >>>>> STRUCT sk_buff. > >>>>> > >>>>> Support is also added to bpftool to be able to display split BTF > >>>>> relative to its .BTF.base section, and also to display the relocated > >>>>> form via the "-R path_to_base_btf". > >>>>> > >>>>> A previous approach to this problem [1] utilized standalone BTF for such > >>>>> cases - where the BTF is not defined relative to base BTF so there is no > >>>>> relocation required. The problem with that approach is that from > >>>>> the verifier perspective, some types are special, and having a custom > >>>>> representation of a core kernel type that did not necessarily match the > >>>>> current representation is not tenable. So the approach taken here was > >>>>> to preserve the split BTF model while minimizing the representation of > >>>>> the context needed to relocate split and current vmlinux BTF. > >>>>> > >>>>> To generate distilled .BTF.base sections the associated dwarves > >>>>> patch (to be applied on the "next" branch there) is needed. > >>>>> Without it, things will still work but bpf_testmod will not be built > >>>>> with a .BTF.base section. > >>>>> > >>>>> Changes since RFC [2]: > >>>>> > >>>>> - updated terminology; we replace clunky "base reference" BTF with > >>>>> distilling base BTF into a .BTF.base section. Similarly BTF > >>>>> reconcilation becomes BTF relocation (Andrii, most patches) > >>>>> - add distilled base BTF by default for out-of-tree modules > >>>>> (Alexei, patch 8) > >>>>> - distill algorithm updated to record size of embedded struct/union > >>>>> by recording it as a 0-vlen STRUCT/UNION with size preserved > >>>>> (Andrii, patch 2) > >>>>> - verify size match on relocation for such STRUCT/UNIONs (Andrii, > >>>>> patch 9) > >>>>> - with embedded STRUCT/UNION recording size, we can have bpftool > >>>>> dump a header representation using .BTF.base + .BTF sections > >>>>> rather than special-casing and refusing to use "format c" for > >>>>> that case (patch 5) > >>>>> - match enum with enum64 and vice versa (Andrii, patch 9) > >>>>> - ensure that resolve_btfids works with BTF without .BTF.base > >>>>> section (patch 7) > >>>>> - update tests to cover embedded types, arrays and function > >>>>> prototypes (patches 3, 12) > >>>>> > >>>>> One change not made yet is adding anonymous struct/unions that the split > >>>>> BTF references in base BTF to the module instead of adding them to the > >>>>> .BTF.base section. That would involve having to maintain two pipes for > >>>>> writing BTF, one for the .BTF.base and one for the split BTF. It would > >>>>> be possible, but there are I think some edge cases that might make it > >>>>> tricky. For example consider a split BTF reference to a base BTF > >>>>> ARRAY which in turn referenced an anonymous STRUCT as type. In such a > >>>>> case, it wouldn't make sense to have the array in the .BTF.base section > >>>>> while having the STRUCT in the module. The general concern is that once > >>>> > >>>> Hm.. not really? ARRAY is a reference type (and anonymous at that), so > >>>> it would have to stay in module's BTF, no? I'll go read the patch > >>>> series again, but let me know if I'm missing something. > >>>> > >> > >> The way things currently work, we preserve all relationships prior to > >> distilling base BTF. That is, if a type was in split BTF prior to > >> calling btf__distill_base(), it will stay in split BTF afterwards. Ditto > >> for base types. This is true for reference types as well as named types. > >> So in the case of the above array for example, prior to distilling types > >> it is in base BTF. If it in turn then referred to a base anonymous > >> struct, both would be in the base and thus the distilled base BTF. In > >> the above case, I was suggesting the array itself was referred to from > >> split BTF, but not in split BTF, sorry if that wasn't clearer. > >> > >> So the problem comes if we moved the anon struct to the module; then we > >> also need to move types that depend on it there. This means we'd need to > >> make the move recursive. That seems doable; the only question is around > > > > Yep, it should be very doable. We just mark everything used from > > "to-be-moved-to-new-split-BTF" types recursively, unless it's > > "qualified named type", where we stop. You have a pass to mark > > embedded types, here it might be another pass to mark > > "used-by-split-BTF-types-but-not-distillable" types. > > > >> the logistics and the effects of doing so. At one extreme we might end > >> up with something that resembles standalone BTF (many/most types in the > > > > My hypothesis is that it is very unlikely that there will be a lot of > > types that have to be copied into split BTF. > > > >> split BTF). That seems unlikely in most cases. I examined one module's > >> BTF base for example, and the only anon structs arose from typedef > >> references possible_net_t, sockptr_t, rwlock_t and atomic_t. These in > >> turn were only referenced once elsewhere in distilled base BTF; a > >> sockptr was in a FUNC_PROTO, but aside from that the typedefs were not > >> otherwise referenced in distilled base BTF, they were referenced in > >> split BTF as embeeded struct field types. > >> > >> So moving all of this to the split BTF seems possible; what I think we > >> probably need to think on a bit is how to handle relocation. Is there a > >> need to relocate these module types too, or can we live with having > >> duplicate atomic_t/sockptr_t typedefs in the module? Currently > >> relocation is simplified by the fact that we only need to relocate the > >> types prior to the module's start id. All we need to do is rewrite type > >> references in split BTF to base ids. If we were relocating split types > >> too we'd need to remove them from split BTF. > > > > I think anything that is not in distilled base should not be > > relocated, so current simplicity is remapping distilled BTF IDs will > > remain. It's ok to have clones/copies of some simple typedefs, > > probably. > > > > We have a few somewhat competing goals here and we need to make a > > tradeoff between them: > > > > a) minimizing split BTF size (or rather not making it too large) > > b) making sure PTR_TO_BTF_ID types work (so module kfuncs can accept > > task_struct and others) > > c) keeping relocation simple, fast, and reliable/unambiguous > > > > By copying anonymous types we potentially hurt a) (but presumably not > > a lot to worry about), and we significantly improve c) by making > > relocation simple/fast/reliably (to the extent possible with "by name" > > lookups). And we (presumably) don't change b), it still works for all > > existing and future cases. > > > > Yeah, case b) is the only lingering concern I have, but in practice it > seems unlikely to arise. One point of clarification - we've discussed so > far mostly anonymous STRUCTs and UNIONs; do you think there are other > anonymous types we should consider, ARRAYs for example? Everything is technically possible, but I'd be surprised if anything but STRUCT/UNION is referred to by PTR_TO_BTF_ID for kfunc. But let's get there first. > > If we ever need to pass anonymous typedef'ed types to kfunc, we'll > > need to think how to represent them in distilled base BTF. But it most > > probably won't be TYPEDEF -> STRUCT chain, but rather empty STRUCT > > with the name of original TYPEDEF + some bit to specify that we are > > looking for a TYPEDEF in real base BTF; I think we have a pass forward > > here, and that's the main thing, but I don't think it's a problem > > worth solving now (or ever). > > > > WDYT? > > Agreed. I think (hope) it's unlikely to arise. > > > > >> > >>>>> we move a type to the module we would need to also ensure any base types > >>>>> that refer to it move there too. For now it is I think simpler to > >>>>> retain the existing split/base type classifications. > >>>> > >>>> We would have to finalize this part before landing, as it has big > >>>> implications on the relocation process. > >>> > >>> Ran out of time, sorry, will continue on Monday. But please consider, > >>> meanwhile, what I mentioned about only having named > >>> structs/unions/enums in distilled base BTF. > >>> > >> > >> Sure, I'll dig into it further. FWIW I agree with the goal of moving > >> anonymous structs/unions if it's doable. I can't see any blocking issues > >> thus far. > > > > Yep, please give it a go, and I'll try to finish the review today, thanks. > > > >> > >>>> > >>>> > >>>>> > >>>>> [1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@xxxxxxxxxx/ > >>>>> [2] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@xxxxxxxxxx/ > >>>>> > >>>>> > >>>>> > >>>>> Alan Maguire (13): > >>>>> libbpf: add support to btf__add_fwd() for ENUM64 > >>>>> libbpf: add btf__distill_base() creating split BTF with distilled base > >>>>> BTF > >>>>> selftests/bpf: test distilled base, split BTF generation > >>>>> libbpf: add btf__parse_opts() API for flexible BTF parsing > >>>>> bpftool: support displaying raw split BTF using base BTF section as > >>>>> base > >>>>> kbuild,bpf: switch to using --btf_features for pahole v1.26 and later > >>>>> resolve_btfids: use .BTF.base ELF section as base BTF if -B option is > >>>>> used > >>>>> kbuild, bpf: add module-specific pahole/resolve_btfids flags for > >>>>> distilled base BTF > >>>>> libbpf: split BTF relocation > >>>>> module, bpf: store BTF base pointer in struct module > >>>>> libbpf,bpf: share BTF relocate-related code with kernel > >>>>> selftests/bpf: extend distilled BTF tests to cover BTF relocation > >>>>> bpftool: support displaying relocated-with-base split BTF > >>>>> > >>>>> include/linux/btf.h | 32 + > >>>>> include/linux/module.h | 2 + > >>>>> kernel/bpf/Makefile | 8 + > >>>>> kernel/bpf/btf.c | 227 +++++-- > >>>>> kernel/module/main.c | 5 +- > >>>>> scripts/Makefile.btf | 12 +- > >>>>> scripts/Makefile.modfinal | 4 +- > >>>>> .../bpf/bpftool/Documentation/bpftool-btf.rst | 15 +- > >>>>> tools/bpf/bpftool/bash-completion/bpftool | 7 +- > >>>>> tools/bpf/bpftool/btf.c | 20 +- > >>>>> tools/bpf/bpftool/main.c | 14 +- > >>>>> tools/bpf/bpftool/main.h | 2 + > >>>>> tools/bpf/resolve_btfids/main.c | 22 +- > >>>>> tools/lib/bpf/Build | 2 +- > >>>>> tools/lib/bpf/btf.c | 561 +++++++++++----- > >>>>> tools/lib/bpf/btf.h | 61 ++ > >>>>> tools/lib/bpf/btf_common.c | 146 ++++ > >>>>> tools/lib/bpf/btf_relocate.c | 630 ++++++++++++++++++ > >>>>> tools/lib/bpf/libbpf.map | 3 + > >>>>> tools/lib/bpf/libbpf_internal.h | 2 + > >>>>> .../selftests/bpf/prog_tests/btf_distill.c | 298 +++++++++ > >>>>> 21 files changed, 1864 insertions(+), 209 deletions(-) > >>>>> create mode 100644 tools/lib/bpf/btf_common.c > >>>>> create mode 100644 tools/lib/bpf/btf_relocate.c > >>>>> create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c > >>>>> > >>>>> -- > >>>>> 2.31.1 > >>>>> > >>>