Re: Encoding BTF information from DWARF causes "has void type" error.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/02/2024 14:23, Arnaldo Carvalho de Melo wrote:
> On Tue, Feb 06, 2024 at 10:42:44AM -0300, Arnaldo Carvalho de Melo wrote:
>> On Tue, Feb 06, 2024 at 01:02:26PM +0100, Sebastian Andrzej Siewior wrote:
>>> with linux kernel v6.8-rc2-RT and the following file as testcase
>>>
>>> ----->8--------
>>>
>>> #include <linux/spinlock.h>
>>> #include <linux/local_lock.h>
>>>
>>> struct per_cpu_struct {
>>>         local_lock_t lock;
>>>         unsigned int something;
>>> };
>>>
>>> static DEFINE_PER_CPU(struct per_cpu_struct, per_cpu_struct) = {
>>>         .lock = INIT_LOCAL_LOCK(lock),
>>> };
>>>
>>> DEFINE_GUARD(ll_lock, local_lock_t __percpu*,
>>>              local_lock(_T),
>>>              local_unlock(_T))
>>>
>>> void function(void);
>>> void function(void)
>>> {
>>>         struct per_cpu_struct *pcs = this_cpu_ptr(&per_cpu_struct);
>>>
>>>         guard(ll_lock)(&per_cpu_struct.lock);
>>>         pcs->something++;
>>> }
>>>
>>> -----8<--------
>>>
>>> compiling and running pahole afterwards:
>>> | make kernel/pahole-tc.o && pahole -J --btf_gen_floats -j --lang_exclude=rust  kernel/pahole-tc.o
>>> |   CC      kernel/pahole-tc.o
>>> | error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
>>>
>>> This doesn't look good. If I swap the order of "lock" and "something"
>>> within per_cpu_struct then it goes away.
>>> The dwarf/die object it complains about has only DW_AT_abstract_origin
>>> and DW_AT_location set.
>

Thanks for reporting and providing all this data!

The eventual generated BTF might give us some clues, running

$ pahole -J pahole-tc-good-swap.o
$ pahole -J pahole-tc-bad.o
 ...and dumping raw BTF we see that in the bad case, we've got 3
variables of different types called "per_cpu_ptr":

$ bpftool btf dump file pahole-tc-bad.o |grep per_cpu_ptr

[88] STRUCT 'per_cpu_struct' size=136 vlen=2
[94] VAR 'per_cpu_struct' type_id=88, linkage=static
[95] VAR 'per_cpu_struct' type_id=89, linkage=static
[96] VAR 'per_cpu_struct' type_id=87, linkage=static
	type_id=94 offset=0 size=136 (VAR 'per_cpu_pe struct')
	type_id=95 offset=0 size=136 (VAR 'per_cpu_struct')
	type_id=96 offset=0 size=136 (VAR 'per_cpu_struct')

The first variable [94] is of type "struct per_cpu_struct", the second is

[89] TYPEDEF 'class_ll_lock_t' type_id=87

...and the third is type id 87:

[86] TYPEDEF 'local_lock_t' type_id=73
[87] PTR '(anon)' type_id=86

which is a pointer to a 'typedef local_local_t'.

Contrast this to the good case, where only 1 is seen:

$ bpftool btf dump file pahole-tc-good-swap.o format raw |grep per_cpu
[88] STRUCT 'per_cpu_struct' size=136 vlen=2
[94] VAR 'per_cpu_struct' type_id=88, linkage=static
	type_id=94 offset=0 size=136 (VAR 'per_cpu_struct')

So I think as you suspected we oddly seem to be conflating the
per_cpu_struct variable with its first member somehow, and
worse doing it twice; once for the typedef class_ll_lock_t and
again for the ptr to local_lock_t.

The DWARF looks right to me:

0x00000764:   DW_TAG_variable
                DW_AT_name      ("per_cpu_struct")
                DW_AT_decl_file
("/home/bigeasy/linux-stable-intree/kernel/pahole-tc.c")
                DW_AT_decl_line (9)
                DW_AT_decl_column       (0x08)
                DW_AT_type      (0x0000073c "per_cpu_struct")
                DW_AT_location  (DW_OP_addr 0x0)


Putting additional debugging in the BTF encoder:

diff --git a/btf_encoder.c b/btf_encoder.c
index fd04008..73ee6f7 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1520,6 +1520,9 @@ static int btf_encoder__encode_cu_variables(struct
btf_encoder *encoder)
                uint64_t addr;
                int id;

+               if (encoder->verbose)
+                       printf("processing var '%s' at pos %p, tag
0x%x\n", variable__name(var), pos, pos->tag);
+
                if (var->declaration && !var->spec)
                        continue;


...we see

search cu 'kernel/pahole-tc.c' for percpu global variables.
processing var 'current_stack_pointer' at pos 0xa14be0, tag 0x34
processing var 'this_cpu_off' at pos 0xa14cb0, tag 0x34
processing var 'per_cpu_struct' at pos 0xa201f0, tag 0x34
Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0
encoded
[96] VAR per_cpu_struct type=90 linkage=0
processing var 'pcs' at pos 0xa20c20, tag 0x34
processing var '__UNIQUE_ID_guard44' at pos 0xa20cf0, tag 0x34
Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0
encoded
[97] VAR per_cpu_struct type=91 linkage=0
processing var '__vpp_verify' at pos 0xa20e60, tag 0x34
processing var 'tcp_ptr__' at pos 0xa20fd0, tag 0x34
processing var '(null)' at pos 0xa21130, tag 0x34
error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that
has void type


Looking at the above, we see it's the __UNIQUE_ID prefixed variables
that cause the trouble; for each of these we wind up encoding an
instance of the per_cpu_struct variable.

We have a special case meant to handle this, but it is triggering for
the __UNIQUE_ID variables:

		/* A lot of "special" DWARF variables (e.g, __UNIQUE_ID___xxx)
                 * have addr == 0, which is the same as, say, valid
                 * fixed_percpu_data per-CPU variable. To distinguish
between
                 * them, additionally compare DWARF and ELF symbol names. If
                 * DWARF doesn't provide proper name, pessimistically assume
                 * bad variable.
                 *
                 * Examples of such special variables are:
                 *
                 *  1. __ADDRESSABLE(sym), which are forcely emitted as
symbols.
                 *  2. __UNIQUE_ID(prefix), which are introduced to
generate unique ids.
                 *  3. __exitcall(fn), functions which are labeled as
exit calls.
                 *
                 *  This is relevant only for vmlinux image, as for kernel
                 *  modules per-CPU data section has non-zero offset so all
                 *  per-CPU symbols have non-zero values.
                 */
                if (var->ip.addr == 0) {
                        if (!dwarf_name || strcmp(dwarf_name, name))
                                continue;
                }

So I suspect the problem is

 if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name)

...which uses the associated address to look up the name, and then goes
on to use that name to encode.

The problem is the conditional check above; I can't see why it needs to
be guarded here. We never want to do an address-based lookup for a
variable and not have it be the same variable name as we have from DWARF
- which is the one we're trying to encode, right? The following fixes
the issue for me, can you try this out if you get a chance? I might be
missing something about that check so other folks please do weigh in if
something looks broken. Thanks!

diff --git a/btf_encoder.c b/btf_encoder.c
index fd04008..aecb311 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1559,10 +1559,8 @@ static int
btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
                 *  modules per-CPU data section has non-zero offset so all
                 *  per-CPU symbols have non-zero values.
                 */
-               if (var->ip.addr == 0) {
-                       if (!dwarf_name || strcmp(dwarf_name, name))
-                               continue;
-               }
+               if (!dwarf_name || strcmp(dwarf_name, name))
+                       continue;

                if (var->spec)
                        var = var->spec;


> Are you sure?
> 
> What I see:
> 
> ⬢[acme@toolbox pahole]$ cp sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o .
> ⬢[acme@toolbox pahole]$ pahole --btf_encode pahole-tc-bad.o
> error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> ⬢[acme@toolbox pahole]$
> 
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep DW_TAG_variable -A5 | grep -w per_cpu_struct -B1 -A4
>  <1><764>: Abbrev Number: 29 (DW_TAG_variable)
>     <765>   DW_AT_name        : (indirect string, offset: 0x184): per_cpu_struct
>     <769>   DW_AT_decl_file   : 1
>     <769>   DW_AT_decl_line   : 9
>     <76a>   DW_AT_decl_column : 8
>     <76b>   DW_AT_type        : <0x73c>
> ⬢[acme@toolbox pahole]$
> 
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep '<73c>.*DW_TAG_structure_type' -A7
>  <1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
>     <73d>   DW_AT_name        : (indirect string, offset: 0x184): per_cpu_struct
>     <741>   DW_AT_byte_size   : 136
>     <742>   DW_AT_decl_file   : 1
>     <743>   DW_AT_decl_line   : 4
>     <744>   DW_AT_decl_column : 8
>     <745>   DW_AT_sibling     : <0x764>
>  <2><749>: Abbrev Number: 1 (DW_TAG_member)
> ⬢[acme@toolbox pahole]$
> 
> But when using verbose mode:
> 
> [95] FUNC function type_id=94
> search cu 'kernel/pahole-tc.c' for percpu global variables.
> Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
> [96] VAR per_cpu_struct type=90 linkage=0
> Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
> [97] VAR per_cpu_struct type=91 linkage=0
> error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> [98] DATASEC .data..percpu size=136 vlen=2
> 	type=96 offset=0 size=136
> 	type=97 offset=0 size=136
> ⬢[acme@toolbox pahole]$ pahole -V --btf_encode pahole-tc-bad.o
> 
> It ends up considering two variables with that name?
> 
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep per_cpu_struct -B1 -A5
>  <1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
>     <73d>   DW_AT_name        : (indirect string, offset: 0x184): per_cpu_struct
>     <741>   DW_AT_byte_size   : 136
>     <742>   DW_AT_decl_file   : 1
>     <743>   DW_AT_decl_line   : 4
>     <744>   DW_AT_decl_column : 8
>     <745>   DW_AT_sibling     : <0x764>
> --
>  <1><764>: Abbrev Number: 29 (DW_TAG_variable)
>     <765>   DW_AT_name        : (indirect string, offset: 0x184): per_cpu_struct
>     <769>   DW_AT_decl_file   : 1
>     <769>   DW_AT_decl_line   : 9
>     <76a>   DW_AT_decl_column : 8
>     <76b>   DW_AT_type        : <0x73c>
>     <76f>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0 	(DW_OP_addr: 0)
> ⬢[acme@toolbox pahole]$
> 
> Just dumping some more info in case Jiri or Alan thinks about something.
> 
> - Arnaldo
> 
>>> At this point I am not sure if gcc wrongly created the dwarf information
>>> or of pahole accidentally matches the internal local_lock_t member as
>>> per_cpu_struct because it starts the same address.
>>>
>>> >From dumpwarf the problematic entry is:
>>> | 0x0000085b:       DW_TAG_variable
>>> |                     DW_AT_abstract_origin       (0x0000099e "t")
>>> |                     DW_AT_location      (DW_OP_addr 0x0, DW_OP_stack_value)
>>>
>>> which looks slightly different in the good case:
>>> | 0x00000852:         DW_TAG_variable
>>> |                       DW_AT_abstract_origin     (0x00000980 "t")
>>> |                       DW_AT_location    (0x00000040: 
>>> |                          [0x0000000000000046, 0x000000000000005b): DW_OP_reg3 RBX
>>> |                          [0x000000000000005b, 0x0000000000000061): DW_OP_addr 0x0, DW_OP_stack_value)
>>> |                       DW_AT_GNU_locviews        (0x0000003c)
>>>
>>> I made an archive with c file, the compiled version in bad and good case
>>> (swapped the first two members), the dumpwarf output and uploaded to
>>> 	https://breakpoint.cc/pahole-tc.tar.xz
>>
>> Lots of great info, I'm on it.
>>
>> - Arnaldo




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux