Re: [PATCH bpf-next v4] bpftool: add support for split BTF to gen min_core_btf

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 9 Feb 2024 11:50:51 -0800

On Thu, Feb 8, 2024 at 2:45 PM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On 08/02/2024 00:26, Andrii Nakryiko wrote:
> > On Tue, Feb 6, 2024 at 2:59 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >>
> >> On 02/02/2024 22:16, Andrii Nakryiko wrote:
> >>> On Wed, Jan 31, 2024 at 10:47 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 30/01/2024 23:05, Bryce Kahle wrote:
> >>>>> From: Bryce Kahle <bryce.kahle@xxxxxxxxxxxxx>
> >>>>>
> >>>>> Enables a user to generate minimized kernel module BTF.
> >>>>>
> >>>>> If an eBPF program probes a function within a kernel module or uses
> >>>>> types that come from a kernel module, split BTF is required. The split
> >>>>> module BTF contains only the BTF types that are unique to the module.
> >>>>> It will reference the base/vmlinux BTF types and always starts its type
> >>>>> IDs at X+1 where X is the largest type ID in the base BTF.
> >>>>>
> >>>>> Minimization allows a user to ship only the types necessary to do
> >>>>> relocations for the program(s) in the provided eBPF object file(s). A
> >>>>> minimized module BTF will still not contain vmlinux BTF types, so you
> >>>>> should always minimize the vmlinux file first, and then minimize the
> >>>>> kernel module file.
> >>>>>
> >>>>> Example:
> >>>>>
> >>>>> bpftool gen min_core_btf vmlinux.btf vm-min.btf prog.bpf.o
> >>>>> bpftool -B vm-min.btf gen min_core_btf mod.btf mod-min.btf prog.bpf.o
> >>>>
> >>>> This is great! I've been working on a somewhat related problem involving
> >>>> split BTF for modules, and I'm trying to figure out if there's overlap
> >>>> with what you've done here that can help in either direction. I'll try
> >>>> and describe what I'm doing. Sorry if this is a bit of a diversion,
> >>>> but I just want to check if there are potential ways your changes could
> >>>> facilitate other scenarios in the future.
> >>>>
> >>>> The problem I'm trying to tackle is to enable split BTF module
> >>>> generation to be more resilient to underlying kernel BTF changes;
> >>>> this would allow for example a module that is not built with the kernel
> >>>> to generate BTF and have it work even if small changes in vmlinux occur.
> >>>> Even a small change in BTF ids in base BTF is enough to invalidate the
> >>>> associated split BTF, so the question is how to make this a bit less
> >>>> brittle. This won't be needed for modules built along with the kernel,
> >>>> but more for cases like a package delivering a kernel module.
> >>>>
> >>>> The way this is done is similar to what you're doing - generating
> >>>> minimal base vmlinux BTF along with the module BTF. In my case however
> >>>> the minimization is not driven by CO-RE relocations; rather it is driven
> >>>> by only adding types that are referenced by module BTF and any other
> >>>> associated types needed. We end up with minimal base BTF that is carried
> >>>> along with the module BTF (in a .BTF.base_minimal section) and this
> >>>> minimal BTF will be used to later reconcile module BTF with the running
> >>>> kernel BTF when the module is loaded; it essentially provides the
> >>>> additional information needed to map to current vmlinux types.
> >>>>
> >>>> In this approach, minimal vmlinux BTF is generated via an additional
> >>>> option to pahole which adds an extra phase to BTF deduplication between
> >>>> module and kernel. Once we have found the candidate mappings for
> >>>> deduplication, we can look at all base BTF references from module BTF
> >>>> and recursively add associated types to the base minimal BTF. Finally we
> >>>> reparent the split BTF to this minimal base BTF. Experiments show most
> >>>> modules wind up with base minimal BTF of around 4000 types, so the
> >>>> minimization seems to work well. But it's complex.
> >>>>
> >>>> So what I've been trying to work out is if this dedup complexity can be
> >>>> eliminated with your changes, but from what I can see, the membership in
> >>>> the minimal base BTF in your case is driven by the CO-RE relocations
> >>>> used in the BPF program. Would there do you think be a future where we
> >>>> would look at doing base minimal BTF generation by other criteria (like
> >>>> references from the module BTF)? Thanks!
> >>>
> >>> Hm... I might be misremembering or missing something, but the problem
> >>> you are solving doesn't seem to be related to BTF minimization. I also
> >>> forgot why you need BTF deduplication, I vaguely remember we needed to
> >>> remember "expectations" of types that module BTF references in vmlinux
> >>> BTF, but I fail to remember why we needed dedup... Perhaps we need a
> >>> BPF office hours session to go over details again?
> >>>
> >>
> >> Yeah, that would be great! I've put
> >>
> >> Making split BTF more resilient
> >>
> >> ..on the agenda for 02-15.
> >>
> >> The reason BTF minimization comes into the picture is this - the
> >> expectations split BTF can have of base BTF can be quite complex, and in
> >> figuring out ways to represent them, it occurred that BTF itself - in
> >> the form of the minimal BTF needed to represent those split BTF
> >> references - made sense. Consider cases like a split BTF struct that
> >> contains a base BTF struct embedded in it. If we have a minimal base BTF
> >> which contains such needed base types, we are in a position to use it to
> >> later reconcile the base BTF worlds at encoding time and use time (for
> >> example vmlinux BTF at module build time versus current vmlinux BTF).
> >>
> >> Further, a natural time to construct that minimal base BTF presents
> >> itself when we do deduplication between split and base BTF.  The phase
> >> after we have mapped split types to canonical types is the ideal time to
> >> handle this; the algorithm is basically
> >>
> >> - foreach reference from split -> base BTF
> >>  - add it to base minimal BTF
> >> This is controlled by a new dedup option - gen_base_btf_minimal - which
> >> would be enabled via  a ---btf_features option to pahole for users who
> >> wanted to generate minimal base BTF. pahole places the new minimized
> >> base BTF in .BTF.base_minimal section, with the split BTF referring to
> >> it in the usual .BTF section. Later this base minimal BTF is used to
> >> reconcile the split BTF expectations with current base BTF.
> >>
> >> The kinds of minimizations I see are pretty reasonable for kernel
> >> modules; I tried a number of in-tree modules (which wouldn't use this
> >> feature in practice, just wanted to have something to test with), and
> >> around 4000 types were observed in base minimal BTF.
> >>
> >> It's possible we could adapt this minimization process to be guided
> >> by CO-RE relocations (rather than split->base BTF references), if that
> >> would help Bryce's case.
> >
> > I think this minimization idea is overcomplicating anything. First, we
> > don't have CO-RE relocations, and from BTF alone we don't know what
> > fields of base BTF structs module is referencing (that may or may not
> > be in DWARF). So I don't think there is anything to minimize.
> >
>
> The minimization is a method to capture expectations of base BTF similar

Important part of btfgen's minimization is about keeping only used
fields (according to CO-RE relocs) and stripping away everything else.
Your "minimization" is quite different, and so referring to both as
"minimization" is just going to confuse things.

> to what you describe below. In the approach I've been pursuing, we
> capture those expectations via the minimal base BTF needed to represent
> the types the module needs.
>
> > On the other hand, it seems reasonable to record a few basic things
> > about base BTF type expectations:
> >   - name
> >   - size and whether that size has to be exact. This would be
> > determined if base BTF type is ever embedded or is only referenced by
> > pointer;
> >   - we can record number of fields, but you said you want to enable
> > extensions, so it will have to be treated as minimum number of fields,
> > probably?
> >
>
> Yeah, the motivation here is that often when changes are backported to
> stable release-based distros, the associated struct changes try to fill
> holes in existing structures so that overall structure size does not
> change in an incompatible way, and any modules that utilize such
> structures continue to work.
>
> > Basically, all we want to ensure is that overall memory layout is
> > compatible and doesn't cause any module field to be shifted.
> >
>
> There are a few other gotchas though. Consider the case of an enum; if
> the values associated with it get shifted between the time the module is
> built and the time it is used, and ENUM_VAL_X that was 1 when the module
> was built, but is now 2 in base vmlinux, we'd need to track that as an
> incompatibility too.

Enum case is a bit weird. If enum is defined in vmlinux BTF, then the
base kernel is built and using that definition of enum, right? So even
if a module's enum definition is different (different integer values),
base's enum definition should probably be used instead in BTF, no?

>
> A minimized view of base BTF - driven by the types the module needs -
> can capture these changes along with the field offset/size issues. The
> approach I use today also avoids expanding types unnecessarily; when it
> encounters a pointer to struct foo in the module representation only,
> the minimized base BTF will just use a fwd representation of that struct
> in minimal base BTF.

So this is basically the only common part with btfgen's minimization,
but overall they are quite different, which is why I'm suggesting to
not combine them.

>
> So to summarize, base BTF minimization is driven by the need to capture
> the set of expectations the module has, similar to what you describe above.
>
> Alan