Re: Kernel build fail with 'btf_encoder__encode: btf__dedup failed!'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex and Jiri,

On 2023/2/10 23:37, Alexandre Ferreira wrote:
> Jiri,
>
> On Fri, Feb 10, 2023 at 8:34 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>> On Fri, Feb 10, 2023 at 08:02:23AM -0600, Alexandre Peixoto Ferreira wrote:
>>> Alam,
>>>
>>> On 2/9/23 07:07, Alan Maguire wrote:
>>>> On 09/02/2023 04:15, Alexandre Peixoto Ferreira wrote:
>>>>> Jiri,
>>>>>
>>>>> On 1/31/23 09:18, Jiri Olsa wrote:
>>>>>> On Sat, Jan 28, 2023 at 01:23:25PM -0600, Alexandre Peixoto Ferreira wrote:
>>>>>>> Jirka and Daniel,
>>>>>>>
>>>>>>> On 1/27/23 18:00, Jiri Olsa wrote:
>>>>>>>> On Fri, Jan 27, 2023 at 04:28:54PM -0600, Alexandre Peixoto Ferreira wrote:
>>>>>>>>> On 1/24/23 00:13, Daniel Xu wrote:
>>>>>>>>>> Hi Jiri,
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 23, 2023, at 1:06 AM, Jiri Olsa wrote:
>>>>>>>>>>> On Sun, Jan 22, 2023 at 10:48:44AM -0700, Daniel Xu wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm getting the following error during build:
>>>>>>>>>>>>
>>>>>>>>>>>>              $ ./tools/testing/selftests/bpf/vmtest.sh -j30
>>>>>>>>>>>>              [...]
>>>>>>>>>>>>                BTF     .btf.vmlinux.bin.o
>>>>>>>>>>>>              btf_encoder__encode: btf__dedup failed!
>>>>>>>>>>>>              Failed to encode BTF
>>>>>>>>>>>>                LD      .tmp_vmlinux.kallsyms1
>>>>>>>>>>>>                NM      .tmp_vmlinux.kallsyms1.syms
>>>>>>>>>>>>                KSYMS   .tmp_vmlinux.kallsyms1.S
>>>>>>>>>>>>                AS      .tmp_vmlinux.kallsyms1.S
>>>>>>>>>>>>                LD      .tmp_vmlinux.kallsyms2
>>>>>>>>>>>>                NM      .tmp_vmlinux.kallsyms2.syms
>>>>>>>>>>>>                KSYMS   .tmp_vmlinux.kallsyms2.S
>>>>>>>>>>>>                AS      .tmp_vmlinux.kallsyms2.S
>>>>>>>>>>>>                LD      .tmp_vmlinux.kallsyms3
>>>>>>>>>>>>                NM      .tmp_vmlinux.kallsyms3.syms
>>>>>>>>>>>>                KSYMS   .tmp_vmlinux.kallsyms3.S
>>>>>>>>>>>>                AS      .tmp_vmlinux.kallsyms3.S
>>>>>>>>>>>>                LD      vmlinux
>>>>>>>>>>>>                BTFIDS  vmlinux
>>>>>>>>>>>>              FAILED: load BTF from vmlinux: No such file or directory
>>>>>>>>>>>>              make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 255
>>>>>>>>>>>>              make[1]: *** Deleting file 'vmlinux'
>>>>>>>>>>>>              make: *** [Makefile:1264: vmlinux] Error 2
>>>>>>>>>>>>
>>>>>>>>>>>> This happens on both bpf-next/master (84150795a49) and 6.2-rc5
>>>>>>>>>>>> (2241ab53cb).
>>>>>>>>>>>>
>>>>>>>>>>>> I've also tried arch linux pahole 1:1.24+r29+g02d67c5-1 as well as
>>>>>>>>>>>> upstream pahole on master (02d67c5176) and upstream pahole on
>>>>>>>>>>>> next (2ca56f4c6f659).
>>>>>>>>>>>>
>>>>>>>>>>>> Of the above 6 combinations, I think I've tried all of them (maybe
>>>>>>>>>>>> missing 1 or 2).
>>>>>>>>>>>>
>>>>>>>>>>>> Looks like GCC got updated recently on my machine, so perhaps
>>>>>>>>>>>> it's related?
>>>>>>>>>>>>
>>>>>>>>>>>>              CONFIG_CC_VERSION_TEXT="gcc (GCC) 12.2.1 20230111"
>>>>>>>>>>>>
>>>>>>>>>>>> I'll try some debugging, but just wanted to report it first.
>>>>>>>>>>> hi,
>>>>>>>>>>> I can't reproduce that.. can you reproduce it outside vmtest.sh?
>>>>>>>>>>>
>>>>>>>>>>> there will be lot of output with patch below, but could contain
>>>>>>>>>>> some more error output
>>>>>>>>>> Thanks for the hints. Doing a regular build outside of vmtest.sh
>>>>>>>>>> seems to work ok. So maybe it's a difference in the build config.
>>>>>>>>>>
>>>>>>>>>> I'll put a little more time into debugging to see if it goes anywhere.
>>>>>>>>>> But I'll have to get back to the regularly scheduled programming
>>>>>>>>>> soon.
>>>>>>>>> 6.2-rc5 compiles correctly when CONFIG_X86_KERNEL_IBT is commented but fails
>>>>>>>>> in pahole when CONFIG_X86_KERNEL_IBT is set.
>>>>>>>> could you plese attach your config and the build error?
>>>>>>>> I can't reproduce that
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> jirka
>>>>>>> My working .config is available at https://pastebin.pl/view/bef3765c
>>>>>>> change CONFIG_X86_KERNEL_IBT to y to get the error.
>>>>>>>
>>>>>>> The error is similar to Daniel's and is shown below:
>>>>>>>
>>>>>>>      LD      .tmp_vmlinux.btf
>>>>>>>      BTF     .btf.vmlinux.bin.o
>>>>>>> btf_encoder__encode: btf__dedup failed!
>>>>>>> Failed to encode BTF
>>>>>>>      LD      .tmp_vmlinux.kallsyms1
>>>>>>>      NM      .tmp_vmlinux.kallsyms1.syms
>>>>>>>      KSYMS   .tmp_vmlinux.kallsyms1.S
>>>>>>>      AS      .tmp_vmlinux.kallsyms1.S
>>>>>>>      LD      .tmp_vmlinux.kallsyms2
>>>>>>>      NM      .tmp_vmlinux.kallsyms2.syms
>>>>>>>      KSYMS   .tmp_vmlinux.kallsyms2.S
>>>>>>>      AS      .tmp_vmlinux.kallsyms2.S
>>>>>>>      LD      .tmp_vmlinux.kallsyms3
>>>>>>>      NM      .tmp_vmlinux.kallsyms3.syms
>>>>>>>      KSYMS   .tmp_vmlinux.kallsyms3.S
>>>>>>>      AS      .tmp_vmlinux.kallsyms3.S
>>>>>>>      LD      vmlinux
>>>>>>>      BTFIDS  vmlinux
>>>>>>> FAILED: load BTF from vmlinux: No such file or directory
>>>>>>> make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 255
>>>>>>> make[1]: *** Deleting file 'vmlinux'
>>>>>>> make: *** [Makefile:1264: vmlinux] Error 2
>>>>>> I can't reproduce that.. I tried with gcc versions:
>>>>>>
>>>>>>      gcc (GCC) 13.0.1 20230117 (Red Hat 13.0.1-0)
>>>>>>      gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
>>>>>>
>>>>>> I haven't found fedora setup with 12.2.1 20230111 yet
>>>>>>
>>>>>> I tried alsa with latest pahole master branch
>>>>>>
>>>>>> were you guys able to get any more verbose output
>>>>>> that I suggested earlier?
>>>>>>
>>>>>> jirka
>>>>> I compiled with and without IBT using the -V on pahole (LLVM_OBJCOPY=objcopy pahole -V -J --btf_gen_floats -j .tmp_vmlinux.btf) and the outfiles are a little too big (540MB). The error happens with this CONST type pointing to itself. That does not happen with the IBT option removed.
>>>>>
>>>>> $ grep  -n "CONST (anon) type_id" /tmp/with_IBT  | more
>>>>> 346:[2] CONST (anon) type_id=2
>>>>> 349:[5] CONST (anon) type_id=5
>>>>> 351:[7] CONST (anon) type_id=7
>>>>> 356:[12] CONST (anon) type_id=12
>>>>> 363:[19] CONST (anon) type_id=19
>>>>> 373:[29] CONST (anon) type_id=29
>>>>> 375:[31] CONST (anon) type_id=31
>>>>> 409:[63] CONST (anon) type_id=63
>>>>> 444:[89] CONST (anon) type_id=0
>>>>> 472:[97] CONST (anon) type_id=97
>>>>> 616:[129] CONST (anon) type_id=129
>>>>> 652:[131] CONST (anon) type_id=131
>>>>> 1319:[234] CONST (anon) type_id=234
>>>>> 1372:[246] CONST (anon) type_id=246
>>>>> ....
>>>>>
>>>>> $diff -ru with_IBT without_IBT
>>>>> --- with_IBT 2023-01-31 09:39:24.915912735 -0600
>>>>> +++ without_IBT 2023-01-31 09:46:23.456005278 -0600
>>>>> @@ -340,346 +340,14800 @@
>>>>>    Found per-CPU symbol 'cpu_tlbstate_shared' at address 0x2c040
>>>>>    Found per-CPU symbol 'mce_poll_banks' at address 0x1ad20
>>>>>    Found 341 per-CPU variables!
>>>>> -Found 61470 functions!
>>>>> +Found 61462 functions!
>>>>> +File .tmp_vmlinux.btf:
>>>>> +[1] FUNC_PROTO (anon) return=0 args=(void)
>>>>> +[2] FUNC verify_cpu type_id=1
>>>>> +[3] FUNC_PROTO (anon) return=0 args=(void)
>>>>> +[4] FUNC sev_verify_cbit type_id=3
>>>>> +search cu 'arch/x86/kernel/head_64.S' for percpu global variables.
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> +Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>> +Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>> +Found per-CPU symbol 'cpu_kick_mask' at address 0x19f78
>>>>> +Found per-CPU symbol 'cpu_tsc_khz' at address 0x19f88
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> +Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>> +Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>> +Found per-CPU symbol 'current_tsc_ratio' at address 0x19fa0
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> +Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>> +Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>> +Found per-CPU symbol 'cpu_kick_mask' at address 0x19f78
>>>>> +Found per-CPU symbol 'cpu_kick_mask' at address 0x19f78
>>>>> +Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>> +Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> +Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>> +Found per-CPU symbol 'cpu_tsc_khz' at address 0x19f88
>>>>> +Found per-CPU symbol 'last_nmi_rip' at address 0x1a018
>>>>> +Found per-CPU symbol 'nmi_stats' at address 0x1a030
>>>>> +Found per-CPU symbol 'swallow_nmi' at address 0x1a020
>>>>> +Found per-CPU symbol 'nmi_state' at address 0x1a010
>>>>> +Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>> +Found per-CPU symbol 'nmi_cr2' at address 0x1a008
>>>>> +Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>> +Found per-CPU symbol 'cpu_tsc_khz' at address 0x19f88
>>>>> +Found per-CPU symbol 'cpu_kick_mask' at address 0x19f78
>>>>> +Found per-CPU symbol 'cpu_kick_mask' at address 0x19f78
>>>>> +Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>> ...
>>>>>
>>>>> And the lines 342-365 of the with_IBT result:
>>>>>        342 Found 341 per-CPU variables!
>>>>>        343 Found 61470 functions!
>>>>>        344 File .tmp_vmlinux.btf:
>>>>>        345 [1] INT long unsigned int size=8 nr_bits=64 encoding=(none)
>>>>>        346 [2] CONST (anon) type_id=2
>>>>>        347 [3] PTR (anon) type_id=6
>>>>>        348 [4] INT char size=1 nr_bits=8 encoding=(none)
>>>>>        349 [5] CONST (anon) type_id=5
>>>>>        350 [6] INT unsigned int size=4 nr_bits=32 encoding=(none)
>>>>>        351 [7] CONST (anon) type_id=7
>>>>>        352 [8] TYPEDEF __s8 type_id=10
>>>>>        353 [9] INT signed char size=1 nr_bits=8 encoding=SIGNED
>>>>>        354 [10] TYPEDEF __u8 type_id=12
>>>>>        355 [11] INT unsigned char size=1 nr_bits=8 encoding=(none)
>>>>>        356 [12] CONST (anon) type_id=12
>>>>>        357 [13] TYPEDEF __s16 type_id=15
>>>>>        358 [14] INT short int size=2 nr_bits=16 encoding=SIGNED
>>>>>        359 [15] TYPEDEF __u16 type_id=17
>>>>>        360 [16] INT short unsigned int size=2 nr_bits=16 encoding=(none)
>>>>>        361 [17] TYPEDEF __s32 type_id=19
>>>>>        362 [18] INT int size=4 nr_bits=32 encoding=SIGNED
>>>>>        363 [19] CONST (anon) type_id=19
>>>>>        364 [20] TYPEDEF __u32 type_id=7
>>>>>        365 [21] TYPEDEF __s64 type_id=23
>>>>>
>>>>> lines 342-362 of without_IBT
>>>>>
>>>>>        342 Found 341 per-CPU variables!
>>>>>        343 Found 61462 functions!
>>>>>        344 File .tmp_vmlinux.btf:
>>>>>        345 [1] FUNC_PROTO (anon) return=0 args=(void)
>>>>>        346 [2] FUNC verify_cpu type_id=1
>>>>>        347 [3] FUNC_PROTO (anon) return=0 args=(void)
>>>>>        348 [4] FUNC sev_verify_cbit type_id=3
>>>>>        349 search cu 'arch/x86/kernel/head_64.S' for percpu global variables.
>>>>>        350 Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>>        351 Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>>        352 Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>>        353 Found per-CPU symbol 'cpu_kick_mask' at address 0x19f78
>>>>>        354 Found per-CPU symbol 'cpu_tsc_khz' at address 0x19f88
>>>>>        355 Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>>        356 Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>>        357 Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>>        358 Found per-CPU symbol 'perf_nmi_tstamp' at address 0x19f70
>>>>>        359 Found per-CPU symbol 'current_tsc_ratio' at address 0x19fa0
>>>>>        360 Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>>        361 Found per-CPU symbol 'cpu_loops_per_jiffy' at address 0x18a08
>>>>>        362 Found per-CPU symbol 'kvm_running_vcpu' at address 0x19f80
>>>>>
>>>>> If the full debug files are useful or a target grep or diff is better let me know.
>>>>>
>>>> I managed to reproduce this too with IBT enabled; one thing I
>>>> noticed is with pahole built with an up-to-date libbpf and the
>>>> changes in https://github.com/acmel/dwarves/tree/next, the problem
>>>> went away. I didn't have time to root-cause it yet however.
>>>>
>>>> Not sure if you're in a position to do this, but if you can,
>>>> would you mind building pahole from
>>>>
>>>> https://github.com/acmel/dwarves/tree/next
>>>>
>>>> ...and re-testing to see if that helps? Thanks!
>>>>
>>>> Alan
>>>>> Thanks,
>>>>>
>>> I tried with libbpf compiled from master
>>> https://github.com/libbpf/libbpf.git and pahole compiled from next branch on
>>> https://github.com/acmel/dwarve with the same result.
>>> With IBT enabled pahole fails and removing it results in a successful
>>> kernel.
>> hi,
>> in case it slipped, you also need to add new options for pahole:
>>    https://lore.kernel.org/bpf/1675949331-27935-1-git-send-email-alan.maguire@xxxxxxxxxx/
>>
>> should be added for version 124 for now
>>
>> jirka
>
> Added the patch to include options on pahole but same problem.
> $ pahole --version
> v1.25
> $ ls -l /usr/lib64/libbpf.so.1.2.0
> -rwxr-xr-x 1 root root 422088 Feb  9 13:23 /usr/lib64/libbpf.so.1.2.0
>
>    UPD     include/generated/utsversion.h
>    CC      init/version-timestamp.o
>    LD      .tmp_vmlinux.btf
>    BTF     .btf.vmlinux.bin.o
> LLVM_OBJCOPY=objcopy pahole -J --btf_gen_floats -j
> --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
> .tmp_vmlinux.btf
> btf_encoder__encode: btf__dedup failed!
> Failed to encode BTF
>
> Thanks,
>

I encountered the same problem when building a new kernel and I found some
reasons for the error.

In short, enabling CONFIG_X86_KERNEL_IBT will change the order of records in
.notes section. In addition, due to historical problems, the alignment of
records in the .notes section is not unified, which leads to the inability of
gelf_getnote() to read the records after the wrong one.

For example:

$readelf -n linux-6.2-rc7-with-IBT/.tmp_vmlinux.btf
Displaying notes found in: .notes
   Owner               Data size   Description
   GNU                  0x00000020       NT_GNU_PROPERTY_TYPE_0
       Properties: x86 feature used: x86, x87, MMX, XMM, FXSR, XSAVE
         x86 ISA used: x86-64-baseline, x86-64-v2, x86-64-v3
   Linux                0x00000004       func
    description data: 06 00 00 00
readelf: Warning: note with invalid namesz and/or descsz found at offset 0x50
readelf: Warning: type: 0x78, namesize: 0x100, descsize: 0x756e694c, alignment: 8

$readelf -n linux-6.2-rc7-no-IBT/.tmp_vmlinux.btf
Displaying notes found in: .notes
   Owner              Data size   Description
   GNU                  0x00000020       NT_GNU_PROPERTY_TYPE_0
       Properties: x86 feature used: x86, x87, MMX, XMM, FXSR, XSAVE
         x86 ISA used: x86-64-baseline, x86-64-v2, x86-64-v3
   GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)
     Build ID: 073b8e5b0373cdc806fac20a9559461be75570a8
readelf: Warning: note with invalid namesz and/or descsz found at offset 0x58
readelf: Warning: type: 0x756e694c, namesize: 0x4, descsize: 0x101, alignment: 8


As shown above, whether IBT is enabled or not, readelf can't read all records
in the .notes section. And gelf_getnote() has the same behaviour.

In dwarf_loader.c:3001, cus__merging_cu() determines whether cu(compile unit)
should be merged by detecting the value of LINUX_ELFNOTE_LTO_INFO. It is from
https://github.com/torvalds/linux/blob/master/include/linux/elfnote-lto.h#L9,
and its value must be 0 or 1. But in the above output from readelf, it reads
"06 00 00 00"(=6), which is impossible. This confirms that the .notes record
has format compatibility problems. There's also something similar at
https://lore.kernel.org/linux-arm-kernel/20210428172847.GC4022@xxxxxxx/

dwarf_loader.c:3001 uses "!= 0" for judgement. So with IBT=y, gelf_getnote()
reads "0x6" and return "true"; while gelf_getnote() crushed before reading the
LINUX_ELFNOTE_LTO_INFO with IBT=n, and returns the right result("false")
coincidently. Since the kernel is not built with CONFIG_LTO, merging compile
units will lead to undefined behaviors.

Specifically, there are tags such as DW_TAG_unspecified_type in the origin cus,
but were filtered out in BTF encders. This causes the small_id(dwarf reader assigned)
is malposed with the offset(which btf encoder uses), and finally leads to the
"btf__dedup failed!" error.

There's an simple fix for pahole. To some extent, this prevents cus__merging_cu()
from being disturbed by alignment errors, but the fundamental solution is to
fix the alignment problem of .notes section.

Signed-off-by: Tianyi Liu <i.pear@xxxxxxxxxxx>
---
 dwarf_loader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index a77598d..b2e9863 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -2998,7 +2998,7 @@ static bool cus__merging_cu(Dwarf *dw, Elf *elf)
 				if (strcmp((char *)data->d_buf + name_off, "Linux") != 0)
 					continue;
 
-				return *(int *)(data->d_buf + desc_off) != 0;
+				return *(int *)(data->d_buf + desc_off) == 1;
 			}
 		}
 	}
-- 
2.39.1




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux