On Thu, Apr 1, 2021 at 2:41 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 4/1/21 2:36 PM, Yonghong Song wrote: > > With latest bpf-next built with clang lto (thin or full), I hit one test > > failures: > > $ ./test_progs -t tcp > > ... > > libbpf: extern (func ksym) 'tcp_slow_start': func_proto [23] incompatible with kernel [115303] > > libbpf: failed to load object 'bpf_cubic' > > libbpf: failed to load BPF skeleton 'bpf_cubic': -22 > > test_cubic:FAIL:bpf_cubic__open_and_load failed > > #9/2 cubic:FAIL > > ... > > > > The reason of the failure is due to bpf program 'tcp_slow_start' > > func signature is different from vmlinux BTF. bpf program uses > > the following signature: > > extern __u32 tcp_slow_start(struct tcp_sock *tp, __u32 acked); > > which is identical to the kernel definition in linux:include/net/tcp.h: > > u32 tcp_slow_start(struct tcp_sock *tp, u32 acked); > > While vmlinux BTF definition like: > > [115303] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2 > > 'tp' type_id=39373 > > 'acked' type_id=18 > > [115304] FUNC 'tcp_slow_start' type_id=115303 linkage=static > > The above is dumped with `bpftool btf dump file vmlinux`. > > You can see the ret_type_id is 0 and this caused the problem. > > > > Looking at dwarf, we have: > > > > 0x11f2ec67: DW_TAG_subprogram > > DW_AT_low_pc (0xffffffff81ed2330) > > DW_AT_high_pc (0xffffffff81ed235c) > > DW_AT_frame_base () > > DW_AT_GNU_all_call_sites (true) > > DW_AT_abstract_origin (0x11f2ed66 "tcp_slow_start") > > ... > > 0x11f2ed66: DW_TAG_subprogram > > DW_AT_name ("tcp_slow_start") > > DW_AT_decl_file ("/home/yhs/work/bpf-next/net/ipv4/tcp_cong.c") > > DW_AT_decl_line (392) > > DW_AT_prototyped (true) > > DW_AT_type (0x11f130c2 "u32") > > DW_AT_external (true) > > DW_AT_inline (DW_INL_inlined) > > David, > > Could you help confirm whether DW_AT_abstract_origin at a > DW_TAG_subprogram always points to another DW_TAG_subprogram, > or there are possible other cases? That's correct, so far as I understand the spec, specifically DWARFv5 3.3.8.3 says: "The root entry for a concrete out-of-line instance of a given inlined subroutine has the same tag as does its associated (abstract) inlined subroutine entry (that is, tag DW_TAG_subprogram rather than DW_TAG_inlined_subroutine)." Though people may come up with novel uses of DWARF features. What would happen if this constraint were violated/what's your motivation for asking (I don't quite understand the connection between test_progs failure description, and this question) - David > > Thanks, > > > > > We have a subprogram which has an abstract_origin pointing to > > the subprogram prototype with return type. Current one pass > > recoding cannot easily resolve this easily since > > at the time recoding for 0x11f2ec67, the return type in > > 0x11f2ed66 has not been resolved. > > > > To simplify implementation, I just added another pass to > > go through all functions after recoding pass. This should > > resolve the above issue. > > > > With this patch, among total 250999 functions in vmlinux, > > 4821 functions needs return type adjustment from type id 0 > > to correct values. The above failed bpf selftest passed too. > > > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > > --- > > dwarf_loader.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 46 insertions(+) > > > > Arnaldo, this is the last known pahole bug in my hand w.r.t. clang > > LTO. With this, all self tests are passed except ones due > > to global function inlining, static variable promotion etc, which > > are not related to pahole. > > > > diff --git a/dwarf_loader.c b/dwarf_loader.c > > index 026d137..367ac06 100644 > > --- a/dwarf_loader.c > > +++ b/dwarf_loader.c > > @@ -2198,6 +2198,42 @@ out: > > return 0; > > } > > > > +static int cu__resolve_func_ret_types(struct cu *cu) > > +{ > > + struct ptr_table *pt = &cu->functions_table; > > + uint32_t i; > > + > > + for (i = 0; i < pt->nr_entries; ++i) { > > + struct tag *tag = pt->entries[i]; > > + > > + if (tag == NULL || tag->type != 0) > > + continue; > > + > > + struct function *fn = tag__function(tag); > > + if (!fn->abstract_origin) > > + continue; > > + > > + struct dwarf_tag *dtag = tag->priv; > > + struct dwarf_tag *dfunc; > > + dfunc = dwarf_cu__find_tag_by_ref(cu->priv, &dtag->abstract_origin); > > + if (dfunc == NULL) { > > + tag__print_abstract_origin_not_found(tag); > > + return -1; > > + } > > + > > + /* > > + * Based on what I see it should be a subprogram, > > + * but double check anyway to ensure I won't mess up > > + * now and in the future. > > + */ > > + if (dfunc->tag->tag != DW_TAG_subprogram) > > + continue; > > + > > + tag->type = dfunc->tag->type; > > + } > > + return 0; > > +} > > + > > static int cu__recode_dwarf_types_table(struct cu *cu, > > struct ptr_table *pt, > > uint32_t i) > > @@ -2637,6 +2673,16 @@ static int cus__merge_and_process_cu(struct cus *cus, struct conf_load *conf, > > /* process merged cu */ > > if (cu__recode_dwarf_types(cu) != LSK__KEEPIT) > > return DWARF_CB_ABORT; > > + > > + /* > > + * for lto build, the function return type may not be > > + * resolved due to the return type of a subprogram is > > + * encoded in another subprogram through abstract_origin > > + * tag. Let us visit all subprograms again to resolve this. > > + */ > > + if (cu__resolve_func_ret_types(cu) != LSK__KEEPIT) > > + return DWARF_CB_ABORT; > > + > > if (finalize_cu_immediately(cus, cu, dcu, conf) > > == LSK__STOP_LOADING) > > return DWARF_CB_ABORT; > >