Re: Per-CPU variables in modules and pahole

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 10, 2020 at 6:56 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Dec 10, 2020 at 10:29 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
> >
> > On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote:
> > > > > Hi,
> > > > >
> > > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the
> > > > > prerequisite for that is BTF data for .data..percpu data section and
> > > > > variables inside that.
> > > > >
> > > > > Turns out, pahole doesn't currently emit any BTF information for such
> > > > > variables in kernel modules. And the reason why is quite confusing and
> > > > > I can't figure it out myself, so was hoping someone else might be able
> > > > > to help.
> > > > >
> > > > > To repro, you can take latest bpf-next tree and add this to
> > > > > bpf_testmod/bpf_testmod.c inside selftests/bpf:
> > > > >
> > > > > $ git diff bpf_testmod/bpf_testmod.c
> > > > >       diff --git
> > > > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > index 2df19d73ca49..b2086b798019 100644
> > > > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > @@ -3,6 +3,7 @@
> > > > >  #include <linux/error-injection.h>
> > > > >  #include <linux/init.h>
> > > > >  #include <linux/module.h>
> > > > > +#include <linux/percpu-defs.h>
> > > > >  #include <linux/sysfs.h>
> > > > >  #include <linux/tracepoint.h>
> > > > >  #include "bpf_testmod.h"
> > > > > @@ -10,6 +11,10 @@
> > > > >  #define CREATE_TRACE_POINTS
> > > > >  #include "bpf_testmod-events.h"
> > > > >
> > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
> > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
> > > > > +
> > > > >  noinline ssize_t
> > > > >  bpf_testmod_test_read(struct file *file, struct kobject *kobj,
> > > > >                       struct bin_attribute *bin_attr,
> > > > >
> > > > > 1. So the very first issue (that I'm going to ignore for now) is that
> > > > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
> > > > > would be ignored by the current pahole logic. So we need to fix that
> > > > > for modules. Adding dummy1 and dummy2 takes care of this for now,
> > > > > bpf_testmod_ksym_percpu has offset 4.
> > > >
> > > > I removed that addr zero check in the modules changes but when
> > > > collecting functions, but it's still there in collect_percpu_var
> > >
> > > Hao had some reason to skip per-cpu variables with offset 0, maybe he
> > > can comment on that before we change it.
> > >
> >
> > When I initially write that check, I see there are multiple symbols of
> > the same name that associate with a single variable, but there is only
> > one that has a non-zero address. Besides, there are symbols that don't
> > associate to any variable and they have zero address. For example,
> > those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are
> > quite a lot, I remember. So I filtered out the zero address for the
> > purpose of accelerating encoding. I noticed that on x86_64, the first
> > page of the percpu section is reserved, so I deem those symbols that
> > are of normal interest should have positive addresses.
>
> So I just checked my local vmlinux image, and seems like the only one
> with addr == 0 is fixed_percpu_data. Everything else that's detected
> as belonging to .data..percpu section looks sane and has non-zero
> offset.
>
> So I think this might have been the case before we switched to using
> ELF symbols and now it's not? I think I'll just drop this check, will
> post the patch, and would really appreciate if you can test it in your
> environment. Does that sound ok?

Ah, never mind. While ELF symbols look good, it's the DWARF variables
side where the problem is. There are lots of DWARF variables that map
to addr 0 and which are impossible to distinguish from readl
fixed_percpu_data, because we can't even rely on getting DWARF
variable name.

I guess I'll leave it as is for now, but we should come up with some
solution, ideally.

>
> >
> > >
> > > >
> > > > >
> > > > > 2. Second issue is more interesting. Somehow, when pahole iterates
> > > > > over DWARF variables, the address of bpf_testmod_ksym_percpu is
> > > > > reported as 0x10e74, not 4. Which totally confuses pahole because
> > > > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
> > > > > I tracked this down to dwarf_getlocation() returning 10e74 as number
> > > > > field in expr.
> > > >
> > > > in which place do you see that address? when I put displayed
> > > > address from collect_percpu_var it shows 4
> > >
> > > yes, ELF symbol's value is 4, but when iterating DWARF variables
> > > (0x10e70 + 4) is returned. It does look like a special handling of
> > > modules. I missed that libdw does some special things for specifically
> > > modules. Further debugging yesterday showed that 0x10e70 roughly
> > > corresponds to the offset of .data..per_cpu if you count all the
> > > allocatable data sections that come before it. So I think you are
> > > right. We should probably centralize the logic of kernel module
> > > detection so that we can handle these module vs non-module differences
> > > properly.
> > >
> > > >
> > > > not sure this is related but looks like similar issue I had to
> > > > solve for modules functions, as described in the changelog:
> > > > (not merged yet)
> > > >
> > > >     btf_encoder: Detect kernel module ftrace addresses
> > > >
> > > >     ...
> > > >     There's one tricky point with kernel modules wrt Elf object,
> > > >     which we get from dwfl_module_getelf function. This function
> > > >     performs all possible relocations, including __mcount_loc
> > > >     section.
> > > >
> > > >     So addrs array contains relocated values, which we need take
> > > >     into account when we compare them to functions values which
> > > >     are relative to their sections.
> > > >     ...
> > > >
> > > > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > > > because not sure where you see that address exactly
> > >
> > >
> > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.
> > >
> > > >
> > > > jirka
> > > >




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux