Re: Encoding BTF information from DWARF causes "has void type" error.

Alan Maguire <alan.maguire@xxxxxxxxxx> · Wed, 7 Feb 2024 11:03:15 +0000

On 07/02/2024 10:01, Alan Maguire wrote:
> The problem is the conditional check above; I can't see why it needs to
> be guarded here. We never want to do an address-based lookup for a
> variable and not have it be the same variable name as we have from DWARF
> - which is the one we're trying to encode, right? The following fixes
> the issue for me, can you try this out if you get a chance? I might be
> missing something about that check so other folks please do weigh in if
> something looks broken. Thanks!
> 

apologies, but my suggested fix is wrong I believe.  When tested with
vmlinux BTF generation, around 100 per-cpu variables disappeared.
I believe the fix should instead be (deliberately leaving in
verbose output to help debug future issues):

diff --git a/btf_encoder.c b/btf_encoder.c
index fd04008..ec081ce 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1527,6 +1527,10 @@ static int
btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
                if (variable__scope(var) != VSCOPE_GLOBAL && !var->spec)
                        continue;

+               if (encoder->verbose)
+                       printf("processing var '%s' at addr 0x%lx\n",
+                              variable__name(var), var->ip.addr);
+
                /* addr has to be recorded before we follow spec */
                addr = var->ip.addr;
                dwarf_name = variable__name(var);
@@ -1559,7 +1563,7 @@ static int btf_encoder__encode_cu_variables(struct
btf_encoder *encoder)
                 *  modules per-CPU data section has non-zero offset so all
                 *  per-CPU symbols have non-zero values.
                 */
-               if (var->ip.addr == 0) {
+               if (addr == 0) {
                        if (!dwarf_name || strcmp(dwarf_name, name))
                                continue;
                }

This works for both vmlinux generation and the problem you ran into.

The explanation is that prior to this, we get an adjusted value for
"addr" to do per-cpu variable lookup, where we subtract the per-cpu base
address. We should also use that adjusted address to check for a zero
address in the test to see if we need to use names to resolve per-cpu
variable identity. The problem was the __UNIQUE_ID variable had a
non-zero var->ip.addr value but its value relative to the per-cpu base
was 0. With the adjusted value, we do the name matching and skip the
encoding as intended I believe.

Alan