Re: [RFC/PATCH 3/3] btf_encoder: Func generation fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 12, 2020 at 1:14 PM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> On Thu, Nov 12, 2020 at 11:54:41AM -0800, Andrii Nakryiko wrote:
>
> SNIP
>
> > > @@ -624,32 +644,46 @@ int cu__encode_btf(struct cu *cu, int verbose, bool force,
> > >                 has_index_type = true;
> > >         }
> > >
> > > -       cu__for_each_function(cu, core_id, fn) {
> > > -               /*
> > > -                * The functions_cnt != 0 means we parsed all necessary
> > > -                * kernel symbols and we are using ftrace location filter
> > > -                * for functions. If it's not available keep the current
> > > -                * dwarf declaration check.
> > > -                */
> > > -               if (functions_cnt) {
> > > +       /*
> > > +        * The functions_cnt != 0 means we parsed all necessary
> > > +        * kernel symbols and we are using ftrace location filter
> > > +        * for functions. If it's not available keep the current
> > > +        * dwarf declaration check.
> > > +        */
> > > +       if (functions_cnt) {
> > > +               cu__for_each_function(cu, core_id, fn) {
> > > +                       struct elf_function *p;
> > > +                       struct elf_function key = { .name = function__name(fn, cu) };
> > > +                       int args_cnt = 0;
> > > +
> > >                         /*
> > > -                        * We check following conditions:
> > > -                        *   - argument names are defined
> > > -                        *   - there's symbol and address defined for the function
> > > -                        *   - function address belongs to ftrace locations
> > > -                        *   - function is generated only once
> > > +                        * Collect functions that match ftrace filter
> > > +                        * and pick the one with proper argument names.
> > > +                        * The BTF generation happens at the end in
> > > +                        * btf_encoder__encode function.
> > >                          */
> > > -                       if (!has_arg_names(cu, &fn->proto))
> > > +                       p = bsearch(&key, functions, functions_cnt,
> > > +                                   sizeof(functions[0]), functions_cmp);
> > > +                       if (!p)
> > >                                 continue;
> > > -                       if (!should_generate_function(btfe, function__name(fn, cu)))
> > > +
> > > +                       if (!has_arg_names(cu, &fn->proto, &args_cnt))
> >
> > So I can't unfortunately reproduce that GCC bug with DWARF info. What
> > was exactly the symptom? Maybe you can also share readelf -wi dump for
> > your problematic vmlinux?
>
> hum, can't see -wi working for readelf, however I placed my vmlinux
> in here:
>   http://people.redhat.com/~jolsa/vmlinux.gz
>
> the symptom is that resolve_btfids will fail kernel build:
>
>   BTFIDS  vmlinux
> FAILED unresolved symbol vfs_getattr
>
> because BTF data contains multiple FUNC records for same function
>
> and the problem is that declaration tag itself is missing:
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060
> so pahole won't skip them
>
> the first workaround was to workaround that and go for function
> records that have code address attached, but that's buggy as well:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1890107
>
> then after some discussions we ended up using ftrace addresses
> as filter for what we want to display.. plus we got static functions
> as well
>
> however for this way we failed to get proper arguments ;-)

Right, I followed along overall, but forgot the details of the initial
problem. Thanks for the refresher. See below for my current thoughts
on dealing with all this.

>
> >
> > The reason I'm asking is because I wonder if we should still ignore
> > functions if fn->declaration is set. E.g., for the issue we
> > investigated yesterday, the function with no arguments has declaration
> > set to 1, so just ignoring it would solve the problem. I'm wondering
> > if it's enough to do just that instead of doing this whole delayed
> > function collection/processing.
> >
> > Also, I'd imagine the only expected cases where we can override  the
> > function (args_cnt > p->args_cnt) would be if p->args_cnt == 0, no?
>
> I don't know, because originally I'd expect that we would not see
> function record with zero args when it actualy has some
>
> > All other cases are either newly discovered "bogusness" of DWARF (and
> > would be good to know about this) or it's a name collision for
> > functions. Basically, before we go all the way to rework this again,
> > let's see if just skipping declarations would be enough?
>
> so there's actualy new developement today in:
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060#c14
>
> gcc might actualy get fixed, so I think we could go back to
> using declaration tag and use ftrace as additional filter..
> at least this exercise gave us static functions ;-)
>
> however we still have fedora with disabled disabled CONFIG_DEBUG_INFO_BTF
> and will need to wait for that fix to enable that back

Right, we better have a more robust approach not relying on
not-yet-released GCC.

>
> >
> > >                                 continue;
> > > -               } else {
> > > +
> > > +                       if (!p->fn || args_cnt > p->args_cnt) {
> > > +                               p->fn = fn;
> > > +                               p->cu = cu;
> > > +                               p->args_cnt = args_cnt;
> > > +                               p->type_id_off = type_id_off;
> > > +                       }
> > > +               }
> > > +       } else {
> > > +               cu__for_each_function(cu, core_id, fn) {
> > >                         if (fn->declaration || !fn->external)
> > >                                 continue;
> > > +                       if (generate_func(btfe, cu, fn, type_id_off))
> > > +                               goto out;
> > >                 }
> >
> > I'm trending towards disliking this completely different fallback
> > mechanism. It saved bpf-next accidentally, but otherwise obscured the
> > issue and generally makes testing pahole with artificial binary BTFs
> > (from test programs) harder. How about we unify approaches, but just
> > use mcount symbols opportunistically, as an additional filter, if it's
> > available?
>
> ok
>
> >
> > With that, testing that we still handle functions with duplicate names
> > properly would be trivial (which I suspect we don't and we'll just
> > keep the one with more args now, right?) And it makes static functions
> > available for non-vmlinux binaries automatically (might be good or
> > bad, but still...).
>
> if we keep just the ftrace filter and rely on declaration tag,
> the args checking will go away right?

So I looked at your vmlinux image. I think we should just keep
everything mostly as it it right now (without changes in this patch),
but add just two simple checks:

1. Skip if fn->declaration (ignore correctly marked func declarations)
2. Skip if DW_AT_inline: 1 (ignore inlined functions).

I'd keep the named arguments check as is, I think it's helpful. 1)
will skip stuff that's explicitly marked as declaration. 2) inline
check will partially mitigate dropping of fn->external check (and we
can't really attach to inlined functions). So will the named args
check as well. Then we should do mcount checks, **if** mcount symbols
are present (which should always be the case for any reasonable
vmlinux image that's supposed to be used with BPF, I think).

So together, it should cover all known issues, I think. And then we'll
just have to watch out for any new ones. GCC bugfix is good, but it's
too late: the harm is done and there are compilers out there that we
should deal with.

>
> jirka
>



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux