On 07/02/2024 12:30, Sebastian Andrzej Siewior wrote: > On 2024-02-07 11:03:15 [+0000], Alan Maguire wrote: >> On 07/02/2024 10:01, Alan Maguire wrote: >>> The problem is the conditional check above; I can't see why it needs to >>> be guarded here. We never want to do an address-based lookup for a >>> variable and not have it be the same variable name as we have from DWARF >>> - which is the one we're trying to encode, right? The following fixes >>> the issue for me, can you try this out if you get a chance? I might be >>> missing something about that check so other folks please do weigh in if >>> something looks broken. Thanks! > > So the former patch worked. > >> apologies, but my suggested fix is wrong I believe. When tested with >> vmlinux BTF generation, around 100 per-cpu variables disappeared. >> I believe the fix should instead be (deliberately leaving in >> verbose output to help debug future issues): > > no worries. > > … >> This works for both vmlinux generation and the problem you ran into. > > the .o file was a reduced testcase. It works with the "complete" file > here, too. However once that file is merged into vmlinux.o I get the error > again: > | $ pahole -J --btf_gen_floats -j --lang_exclude=rust pahole-tc2/skbuff.o > | $ pahole -J --btf_gen_floats -j --lang_exclude=rust pahole-tc2/vmlinux.o --btf_encode_force > | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type > | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type > | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type > | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type > > The -V output is huge and I don't really know what to look at. The > skbuff.o skipped __UNIQUE_ID_guard826 and a few "NULLs" and the address > for __UNIQUE_ID_guard826 was 0. In the vmlinux.o the address for > __UNIQUE_ID_guard826 is 19440. > Thanks for trying this out! From what I can see, when the individual objects are linked we no longer have the offset 0 that triggers the name comparison. On the other side though, if we are not selective in doing these name comparisons, we lose encoding of a fair few legitimate per-cpu variables in vmlinux BTF. So the question is how to thread the needle in ensuring that the matches are accurate for cases like the above while also ensuring we don't leave any legitimate variables out. I don't have a good answer to that yet I'm afraid. I think the problem of missing variables when the check is unselective _may_ boil down to cases where the DWARF name is NULL (due to it using an abstract origin reference); in such cases we cannot name-match and such variables would be skipped. We have some code in dwarf_loader.c that attempts to merge abstract origin references and their referents such that the name is filled in for the abstract origin-based variables, but it may be that we are missing a corner case somewhere. I'll dig into this further on Friday, but if anyone else has time to do some analysis here that would be great! Thanks! Alan > I uploaded the two .o at > https://breakpoint.cc/pahole-tc2.tar.xz > > it decompresses to 1.1GiB. > >> The explanation is that prior to this, we get an adjusted value for >> "addr" to do per-cpu variable lookup, where we subtract the per-cpu base >> address. We should also use that adjusted address to check for a zero >> address in the test to see if we need to use names to resolve per-cpu >> variable identity. The problem was the __UNIQUE_ID variable had a >> non-zero var->ip.addr value but its value relative to the per-cpu base >> was 0. With the adjusted value, we do the name matching and skip the >> encoding as intended I believe. >> >> Alan > > Sebastian