On 31/10/2022 15:49, Eduard Zingerman wrote: > On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote: >> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > [...] >>> + >>> +/* >>> + * Collect a `name_off_map` that maps type names to type ids for all >>> + * canonical structs and unions. If the same name is shared by several >>> + * canonical types use a special value 0 to indicate this fact. >>> + */ >>> +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) >>> +{ >>> + int i, err = 0; >>> + __u32 type_id, collision_id; >>> + __u16 kind; >>> + struct btf_type *t; >>> + >>> + for (i = 0; i < d->btf->nr_types; i++) { >>> + type_id = d->btf->start_id + i; >>> + t = btf_type_by_id(d->btf, type_id); >>> + kind = btf_kind(t); >>> + >>> + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) >>> + continue; >> >> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0 > > Interestingly this is necessary only for mixed enum / enum64 case. > Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type: > Ah, great catch! A forward can look like an enum to one CU but another CU can specify values that make it an enum64. > case BTF_KIND_ENUM: > h = btf_hash_enum(t); > for_each_dedup_cand(d, hash_entry, h) { > cand_id = (__u32)(long)hash_entry->value; > cand = btf_type_by_id(d->btf, cand_id); > if (btf_equal_enum(t, cand)) { > new_id = cand_id; > break; > } > if (btf_compat_enum(t, cand)) { > if (btf_is_enum_fwd(t)) { > /* resolve fwd to full enum */ > new_id = cand_id; > break; > } > /* resolve canonical enum fwd to full enum */ > d->map[cand_id] = type_id; > } > } > break; > // ... similar logic for ENUM64 ... > > - btf_hash_enum ignores vlen when hashing; > - btf_compat_enum compares only names and sizes. > > So, if forward and main declaration kinds match (either BTF_KIND_ENUM > or BTF_KIND_ENUM64) the forward declaration would be removed. But if > the kinds are different the forward declaration would remain. E.g.: > > CU #1: > enum foo; > enum foo *a; > > CU #2: > enum foo { x = 0xfffffffff }; > enum foo *b; > > BTF: > [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1 > 'x' val=68719476735ULL > [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) > [3] PTR '(anon)' type_id=1 > [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0 > [5] PTR '(anon)' type_id=4 > > BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum > forward declarations are not. So it would be incorrect to add enum > forward declaration unification logic to btf_dedup_resolve_fwds, > because the following case would not be covered: > > CU #1: > enum foo; > struct s { enum foo *a; } *a; > > CU #2: > enum foo { x = 0xfffffffff }; > struct s { enum foo *a; } *b; > > Currently STRUCTs 's' are not de-duplicated. > What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly want to avoid deduping "struct s" then since we can't be sure that it is the same enum they are pointing at. That's the logic we employ for structs at least, based upon the rationale that we can't feed back knowledge of types from module to kernel BTF since the latter is now fixed (Andrii, do correct me if I have this wrong). In such a case the enum is no longer standalone; it serves the purpose of allowing us to define a pointer to a module-specific type. We recently found some examples of this sort of thing with structs, where the struct was defined in module BTF, making dedup fail for some core kernel data types, but the problem was restricted to modules which _did_ define the type so wasn't a major driver of dedup failures. Not sure if there's many (any?) enum cases of this in practice. I suppose if we could guarantee the dedup happened within the same object (kernel or module) we could relax this constraint though? Alan