On Thu, Feb 8, 2024 at 3:01 PM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 08/02/2024 01:56, Yonghong Song wrote: > > > > On 2/7/24 4:30 PM, Andrii Nakryiko wrote: > >> On Wed, Feb 7, 2024 at 2:38 PM Yonghong Song <yonghong.song@xxxxxxxxx> > >> wrote: > >>> > >>> On 2/7/24 10:51 AM, Bryce Kahle wrote: > >>>> On Mon, Feb 5, 2024 at 10:21 AM Andrii Nakryiko > >>>> <andrii.nakryiko@xxxxxxxxx> wrote: > >>>>> 3) btf__dedup() will deduplicate everything, so that only unique type > >>>>> definitions remain. > >>>>> > >>> A random thought about another way. > >>> At module side, we keep > >>> - module btf > >>> - another section (e.g. .BTF.extra) to keep minimum kernel-side > >>> types which directly used by module btf > >>> > > Yep, that's exactly the approach I was pursuing; an extra section > containing those types (I was calling it .BTF.base_minimal). > > >>> for example, module btf has > >>> struct foo { > >>> struct task_struct *t; > >>> } > >>> module btf encoding will have id, say 20, > >>> for 'struct task_struct' which is at that time > >>> the id in linux kernel. > >>> Then the module .BTF.extra contains > >>> id 20: struct task_struct type encoding > >>> there is no need to encode more types beyond pointers. > >>> this can be simpler or more complex depending > >>> on what to do during module load. > >>> > > Right, or in BTF you can use a FWD declaration for task_struct. The > approach I'm using explicitly identifies types that are only > pointer-referenced and uses FWDS for them, and this helps keep the > representation as small as possible. > > >>> When a module load: > >>> For each .BTF.extra entry, trying to match > >>> the corresponding types in the current kernel. > >>> The type in the current type should have same > >>> size as the one in .BTF.extra if otherwise > >>> layout in the module btf may change. > >>> > >>> If new kernel type can be used for module BTF, > >>> simply replace the old id with new id in module BTF. > >>> > >>> Otherwise, type mismatch may happen and the corresponding > >>> module btf type should be invalidated. > > Yep, this is the process I describe as reconciliation; where we make > sure base BTF at encoding time and current vmlinux BTF are compatible, > and if so we renumber base BTF references in the module using the > current vmlinux BTF ids. So if compatible, after reconciliation the > module BTF looks just like any other module BTF built against that exact > vmlinux. > > >> Yes, I agree, see my reply to Alan. I'm just unsure how strict we want > >> to be and whether we need to record fields of expected vmlinux BTF > >> types. Or if just recording expected size would be enough (to ensure > >> correct memory layout if base BTF type is embedded into module BTF > >> type). > >> > >> Perhaps, if BTF type is referenced from some "trusted" BTF type (used > >> by kfunc, or in BTF ID set) we might want to enforce strict > >> compatibility, but for any other type just make sure that size is > >> correct (if it matters at all; i.e., if base BTF type is referenced by > >> pointer only, we don't even need to check size). > > > > Agree. The above is a good start. I guess some real-world investigations > > can help shape the actual design about what is the minimum change to > > make it work. > > > > I'll try and send a pointer to the work-in-progress code prior to the > BPF office hours next week. In investigating how much info is required, > for most in-tree modules (which I force-built with minimal BTF) we ended > up with information about 4000 types or so. So it's a significant > minimization compared to vmlinux BTF. 4000 is still quite a lot and is a significant fraction of vmlinux BTF types. I'm curious if you measured the size increase from recording these types? > > In this context, perhaps my describing the information we collect about > base BTF as minimization is misleading; the intent is really focused not > on making base BTF small (although of course that's important from a > practical perspective), but collecting the info about base BTF needed to > later reconcile it with the running kernel at load time. Maybe > .BTF.base_expects or something like that might make this clearer? Thanks! > > Alan > >> > >> WDYT? > >> > >>>> Since minimization only keeps used struct and union members, couldn't > >>>> you have two internal types from different modules which conflict and > >>>> end up using the wrong offset? > >>>> > >>>> Example: > >>>> in module M: > >>>> struct S { > >>>> ... // other unused members > >>>> int x; // offset 12 (for example) > >>>> } > >>>> > >>>> in module N: > >>>> struct S { > >>>> ... // other unused members > >>>> int x; // offset 20 (something different from S.x in module M) > >>>> } > >>>> > >