Re: [PATCH dwarves] dwarf_loader: handle subprogram ret type with abstract_origin properly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Em Thu, Apr 01, 2021 at 05:00:46PM -0700, David Blaikie escreveu:
> On Thu, Apr 1, 2021 at 4:41 PM Yonghong Song <yhs@xxxxxx> wrote:
> >
> >
> >
> > On 4/1/21 3:27 PM, David Blaikie wrote:
> > > On Thu, Apr 1, 2021 at 2:41 PM Yonghong Song <yhs@xxxxxx
> > > <mailto:yhs@xxxxxx>> wrote:
> > >  >
> > >  >
> > >  >
> > >  > On 4/1/21 2:36 PM, Yonghong Song wrote:
> > >  > > With latest bpf-next built with clang lto (thin or full), I hit one
> > > test
> > >  > > failures:
> > >  > >    $ ./test_progs -t tcp
> > >  > >    ...
> > >  > >    libbpf: extern (func ksym) 'tcp_slow_start': func_proto [23]
> > > incompatible with kernel [115303]
> > >  > >    libbpf: failed to load object 'bpf_cubic'
> > >  > >    libbpf: failed to load BPF skeleton 'bpf_cubic': -22
> > >  > >    test_cubic:FAIL:bpf_cubic__open_and_load failed
> > >  > >    #9/2 cubic:FAIL
> > >  > >    ...
> > >  > >
> > >  > > The reason of the failure is due to bpf program 'tcp_slow_start'
> > >  > > func signature is different from vmlinux BTF. bpf program uses
> > >  > > the following signature:
> > >  > >    extern __u32 tcp_slow_start(struct tcp_sock *tp, __u32 acked);
> > >  > > which is identical to the kernel definition in linux:include/net/tcp.h:
> > >  > >    u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
> > >  > > While vmlinux BTF definition like:
> > >  > >    [115303] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2
> > >  > >            'tp' type_id=39373
> > >  > >            'acked' type_id=18
> > >  > >    [115304] FUNC 'tcp_slow_start' type_id=115303 linkage=static
> > >  > > The above is dumped with `bpftool btf dump file vmlinux`.
> > >  > > You can see the ret_type_id is 0 and this caused the problem.
> > >  > >
> > >  > > Looking at dwarf, we have:
> > >  > >
> > >  > > 0x11f2ec67:   DW_TAG_subprogram
> > >  > >                  DW_AT_low_pc    (0xffffffff81ed2330)
> > >  > >                  DW_AT_high_pc   (0xffffffff81ed235c)
> > >  > >                  DW_AT_frame_base        ()
> > >  > >                  DW_AT_GNU_all_call_sites        (true)
> > >  > >                  DW_AT_abstract_origin   (0x11f2ed66 "tcp_slow_start")
> > >  > > ...
> > >  > > 0x11f2ed66:   DW_TAG_subprogram
> > >  > >                  DW_AT_name      ("tcp_slow_start")
> > >  > >                  DW_AT_decl_file
> > > ("/home/yhs/work/bpf-next/net/ipv4/tcp_cong.c")
> > >  > >                  DW_AT_decl_line (392)
> > >  > >                  DW_AT_prototyped        (true)
> > >  > >                  DW_AT_type      (0x11f130c2 "u32")
> > >  > >                  DW_AT_external  (true)
> > >  > >                  DW_AT_inline    (DW_INL_inlined)
> > >  >
> > >  > David,
> > >  >
> > >  > Could you help confirm whether DW_AT_abstract_origin at a
> > >  > DW_TAG_subprogram always points to another DW_TAG_subprogram,
> > >  > or there are possible other cases?
> > >
> > > That's correct, so far as I understand the spec, specifically DWARFv5
> > > <http://dwarfstd.org/doc/DWARF5.pdf>
> > > 3.3.8.3 says:
> > >
> > > "The root entry for a concrete out-of-line instance of a given inlined
> > > subroutine has the same tag as does its associated (abstract) inlined
> > > subroutine entry (that is, tag DW_TAG_subprogram rather than
> > > DW_TAG_inlined_subroutine)."
> >
> > Thanks. That means that some of my codes in the patch is
> > dead code.
> >
> > >
> > > Though people may come up with novel uses of DWARF features. What would
> > > happen if this constraint were violated/what's your motivation for
> > > asking (I don't quite understand the connection between test_progs
> > > failure description, and this question)
> >
> > I have some codes to check the tag associated with abstract_origin
> > for a subprogram must be a subprogram. Through experiment, I didn't
> > see a violation, so I wonder that I can get confirmation from you
> > and then I may delete that code.
> >
> > The test_progs failure exposed the bug, that is all.
> >
> > pahole cannot handle all weird usages of dwarf, so I think pahole
> > is fine only to support well-formed dwarf.
> 
> Sounds good. Thanks for the context!

David, since you took the time to go thru the changes and to agree that
Yonghong's fix is good, can I add a:

Acked-by: David Blaikie <dblaikie@xxxxxxxxx>

to this patch?

Maybe even a:

Reviewed-by: David Blaikie <dblaikie@xxxxxxxxx>

Thanks,

- Arnaldo
 
> >
> > >
> > > - David
> > >
> > >  >
> > >  > Thanks,
> > >  >
> > >  > >
> > >  > > We have a subprogram which has an abstract_origin pointing to
> > >  > > the subprogram prototype with return type. Current one pass
> > >  > > recoding cannot easily resolve this easily since
> > >  > > at the time recoding for 0x11f2ec67, the return type in
> > >  > > 0x11f2ed66 has not been resolved.
> > >  > >
> > >  > > To simplify implementation, I just added another pass to
> > >  > > go through all functions after recoding pass. This should
> > >  > > resolve the above issue.
> > >  > >
> > >  > > With this patch, among total 250999 functions in vmlinux,
> > >  > > 4821 functions needs return type adjustment from type id 0
> > >  > > to correct values. The above failed bpf selftest passed too.
> > >  > >
> > >  > > Signed-off-by: Yonghong Song <yhs@xxxxxx <mailto:yhs@xxxxxx>>
> > >  > > ---
> > >  > >   dwarf_loader.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  > >   1 file changed, 46 insertions(+)
> > >  > >
> > >  > > Arnaldo, this is the last known pahole bug in my hand w.r.t. clang
> > >  > > LTO. With this, all self tests are passed except ones due
> > >  > > to global function inlining, static variable promotion etc, which
> > >  > > are not related to pahole.
> > >  > >
> > >  > > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > >  > > index 026d137..367ac06 100644
> > >  > > --- a/dwarf_loader.c
> > >  > > +++ b/dwarf_loader.c
> > >  > > @@ -2198,6 +2198,42 @@ out:
> > >  > >       return 0;
> > >  > >   }
> > >  > >
> > >  > > +static int cu__resolve_func_ret_types(struct cu *cu)
> > >  > > +{
> > >  > > +     struct ptr_table *pt = &cu->functions_table;
> > >  > > +     uint32_t i;
> > >  > > +
> > >  > > +     for (i = 0; i < pt->nr_entries; ++i) {
> > >  > > +             struct tag *tag = pt->entries[i];
> > >  > > +
> > >  > > +             if (tag == NULL || tag->type != 0)
> > >  > > +                     continue;
> > >  > > +
> > >  > > +             struct function *fn = tag__function(tag);
> > >  > > +             if (!fn->abstract_origin)
> > >  > > +                     continue;
> > >  > > +
> > >  > > +             struct dwarf_tag *dtag = tag->priv;
> > >  > > +             struct dwarf_tag *dfunc;
> > >  > > +             dfunc = dwarf_cu__find_tag_by_ref(cu->priv,
> > > &dtag->abstract_origin);
> > >  > > +             if (dfunc == NULL) {
> > >  > > +                     tag__print_abstract_origin_not_found(tag);
> > >  > > +                     return -1;
> > >  > > +             }
> > >  > > +
> > >  > > +             /*
> > >  > > +              * Based on what I see it should be a subprogram,
> > >  > > +              * but double check anyway to ensure I won't mess up
> > >  > > +              * now and in the future.
> > >  > > +              */
> > >  > > +             if (dfunc->tag->tag != DW_TAG_subprogram)
> > >  > > +                     continue;
> > >  > > +
> > >  > > +             tag->type = dfunc->tag->type;
> > >  > > +     }
> > >  > > +     return 0;
> > >  > > +}
> > >  > > +
> > >  > >   static int cu__recode_dwarf_types_table(struct cu *cu,
> > >  > >                                       struct ptr_table *pt,
> > >  > >                                       uint32_t i)
> > >  > > @@ -2637,6 +2673,16 @@ static int cus__merge_and_process_cu(struct
> > > cus *cus, struct conf_load *conf,
> > >  > >       /* process merged cu */
> > >  > >       if (cu__recode_dwarf_types(cu) != LSK__KEEPIT)
> > >  > >               return DWARF_CB_ABORT;
> > >  > > +
> > >  > > +     /*
> > >  > > +      * for lto build, the function return type may not be
> > >  > > +      * resolved due to the return type of a subprogram is
> > >  > > +      * encoded in another subprogram through abstract_origin
> > >  > > +      * tag. Let us visit all subprograms again to resolve this.
> > >  > > +      */
> > >  > > +     if (cu__resolve_func_ret_types(cu) != LSK__KEEPIT)
> > >  > > +             return DWARF_CB_ABORT;
> > >  > > +
> > >  > >       if (finalize_cu_immediately(cus, cu, dcu, conf)
> > >  > >           == LSK__STOP_LOADING)
> > >  > >               return DWARF_CB_ABORT;
> > >  > >

-- 

- Arnaldo



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux