On Mon, Nov 02, 2020 at 10:27:20PM -0800, Andrii Nakryiko wrote: > On Mon, Nov 2, 2020 at 9:10 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Wed, Oct 28, 2020 at 05:58:59PM -0700, Andrii Nakryiko wrote: > > > @@ -2942,6 +2948,13 @@ struct btf_dedup { > > > __u32 *hypot_list; > > > size_t hypot_cnt; > > > size_t hypot_cap; > > > + /* Whether hypothethical mapping, if successful, would need to adjust > > > + * already canonicalized types (due to a new forward declaration to > > > + * concrete type resolution). In such case, during split BTF dedup > > > + * candidate type would still be considered as different, because base > > > + * BTF is considered to be immutable. > > > + */ > > > + bool hypot_adjust_canon; > > > > why one flag per dedup session is enough? > > So the entire hypot_xxx state is reset before each struct/union type > graph equivalence check. Then for each struct/union type we might do > potentially many type graph equivalence checks against each of > potential canonical (already deduplicated) struct. Let's keep that in > mind for the answer below. > > > Don't you have a case where some fwd are pointing to base btf and shouldn't > > be adjusted while some are in split btf and should be? > > It seems when this flag is set to true it will miss fwd in split btf? > > So keeping the above note in mind, let's think about this case. You > are saying that some FWDs would have candidates in base BTF, right? > That means that the canonical type we are checking equivalence against > has to be in the base BTF. That also means that all the canonical type > graph types are in the base BTF, right? Because no base BTF type can > reference types from split BTF. This, subsequently, means that no FWDs > from split BTF graph could have canonical matching types in split BTF, > because we are comparing split types against only base BTF types. > > With that, if hypot_adjust_canon is triggered, *entire graph* > shouldn't be matched. No single type in that (connected) graph should > be matched to base BTF. We essentially pretend that canonical type > doesn't even exist for us (modulo the subtle bit of still recording > base BTF's FWD mapping to a concrete type in split BTF for FWD-to-FWD > resolution at the very end, we can ignore that here, though, it's an > ephemeral bookkeeping discarded after dedup). > > In your example you worry about resolving FWD in split BTF to concrete > type in split BTF. If that's possible (i.e., we have duplicates and > enough information to infer the FWD-to-STRUCT mapping), then we'll > have another canonical type to compare against, at which point we'll > establish FWD-to-STRUCT mapping, like usual, and hypot_adjust_canon > will stay false (because we'll be staying with split BTF types only). > > But honestly, with graphs it can get so complicated that I wouldn't be > surprised if I'm still missing something. So far, manually checking > the resulting BTF showed that generated deduped BTF types look > correct. Few cases where module BTFs had duplicated types from vmlinux > I was able to easily find where exactly vmlinux had FWD while modules > had STRUCT/UNION. > > But also, by being conservative with hypot_adjust_canon, the worst > case would be slight duplication of types, which is not the end of the > world. Everything will keep working, no data will be corrupted, libbpf > will still perform CO-RE relocation correctly (because memory layout > of duplicated structs will be consistent across all copies, just like > it was with task_struct until ring_buffers were renamed). Yes. That last part is comforting. The explanation also makes sense. Not worried about it anymore.