On Thu, Feb 28, 2019 at 11:42 AM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 2/28/19 11:07 AM, Andrii Nakryiko wrote: > > On Thu, Feb 28, 2019 at 10:19 AM Yonghong Song <yhs@xxxxxx> wrote: > >> > >> > >> > >> On 2/27/19 2:46 PM, Andrii Nakryiko wrote: > >>> When checking available canonical candidates for struct/union algorithm > >>> utilizes btf_dedup_is_equiv to determine if candidate is suitable. This > >>> check is not enough when candidate is corresponding FWD for that > >>> struct/union, because according to equivalence logic they are > >>> equivalent. When it so happens that FWD and STRUCT/UNION end in hashing > >>> to the same bucket, it's possible to create remapping loop from FWD to > >>> STRUCT and STRUCT to same FWD, which will cause btf_dedup() to loop > >>> forever. > >>> > >>> This patch fixes the issue by additionally checking that type and > >>> canonical candidate are strictly equal (utilizing btf_equal_struct). > >> > >> It looks like btf_equal_struct() checking equality except > >> member type id's. Maybe calling it btf_almost_equal_struct() or > >> something like that? > > > > Yes, for struct/union we can't compare types directly, that's what > > btf_dedup_is_equiv is doing. I think btf_equal_struct w/ comment > > explaining this particular behavior is good enough. If you insist, > > though, I'd rather go to something like btf_shallow_equal_struct or > > something along those lines. > > btf_shallow_equal_struct() will be fine. Ok. > > > > >> > >>> > >>> Fixes: d5caef5b5655 ("btf: add BTF types deduplication algorithm") > >>> Reported-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> > >>> Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > >>> --- > >>> tools/lib/bpf/btf.c | 6 +++++- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c > >>> index 6bbb710216e6..53db26d158c9 100644 > >>> --- a/tools/lib/bpf/btf.c > >>> +++ b/tools/lib/bpf/btf.c > >>> @@ -2255,7 +2255,7 @@ static void btf_dedup_merge_hypot_map(struct btf_dedup *d) > >>> static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id) > >>> { > >>> struct btf_dedup_node *cand_node; > >>> - struct btf_type *t; > >>> + struct btf_type *cand_type, *t; > >>> /* if we don't find equivalent type, then we are canonical */ > >>> __u32 new_id = type_id; > >>> __u16 kind; > >>> @@ -2275,6 +2275,10 @@ static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id) > >>> for_each_dedup_cand(d, h, cand_node) { > >>> int eq; > >>> > >>> + cand_type = d->btf->types[cand_node->type_id]; > >>> + if (!btf_equal_struct(t, cand_type)) > >> > >> The comment for this btf_equal_struct is not quite right. > >> /* > >> * Check structural compatibility of two FUNC_PROTOs, ignoring > >> referenced type > >> * IDs. This check is performed during type graph equivalence check and > >> * referenced types equivalence is checked separately. > >> */ > >> static bool btf_equal_struct(struct btf_type *t1, struct btf_type *t2) > >> > >> It should be two "struct/union types". > > > > Yep, good catch, will fix! > > > >> > >>> + continue; > >>> + > >> > >> I did not trace the algorithm how infinite loop happens. But the above > > > > Check the test in follow up patch. It has a minimal example that > > triggers this bug. It happens when we have some FWD x, which we > > discover that it should be resolved to some STRUCT x (as a result of > > equivalence check/resolution of some other struct s, that references > > struct x internally). But that struct x might not have been > > deduplicated yet, we just record this FWD -> STRUCT mapping so that we > > don't lose this connection. Later, once we get to deduplication of > > struct x, FWD x will be (in case of hash collision) one possible > > candidate to consider for deduplication. At that point, > > btf_dedup_is_equiv will consider them equivalent (but they are not > > equal (!), that's where the bug is), so we'll try to resolve STRUCT x > > -> FWD x, which creates a loop. > > > > In btf_dedup_merge_hypot_map() that is used to record discovered > > "equivalences" during struct/union type graph equivalence check, we > > have explicit check to never resolve STRUCT/UNION into equivalent FWD, > > so such loop shouldn't happen, except I missed the case of having FWD > > as a possible dedup candidate due to hash collision. > > > >> change is certainly a correct one, you want to do deduplication only > >> after everything else (except member types) are euqal? > > > > Well, if not for special case of FWD == STRUCT/UNION when > > deduplicating structs, btf_dedup_is_equiv would be enough, because it > > already checks for btf_equal_struct internally, when both types are > > struct/union. It's just the special bit at the beginning of is_equiv > > check that allows FWD and STRUCT/UNION with the same name to be > > declared equivalent, that throws this off. > > > >> > >> If the bug is due to circle in struct->fwd and fwd->struct mappings, > >> maybe a simple check whether such circle exists or not before update > >> the mapping will also work? I am not proposing this fix, but want > >> to understand better the issue. > > > > That's essentially what we use btf_equal_struct for here, really. We > > could equivalently just check BTF_INFO_KIND(t) == BTF_INFO_KIND(cand) > > explicitly, but I btf_equal_struct feels a bit more generic and > > obviously correct. > > Okay, I see. So the goal is really to prevent processing FWD in the > struct/union dedup candidate list. It will be good to summarize > the above detailed explanation in commit message. Ok, will try to do this more succinctly. > > With the above suggested changes, > Acked-by: Yonghong Song <yhs@xxxxxx> > > > > >> > >> > >> > >> > >>> btf_dedup_clear_hypot_map(d); > >>> eq = btf_dedup_is_equiv(d, type_id, cand_node->type_id); > >>> if (eq < 0) > >>>