Hi Nathan, Wentao and I were looking into this issue in the past weeks. The high level conclusion is that it seems to be some problem with lld and I will go over the detail here. On 10/3/24 6:29 PM, Nathan Chancellor wrote: > I seem to have narrowed down it to a few different configurations on top > of x86_64_defconfig but I will include the full bad configuration as an > attachment just in case anything else is relevant. > > $ echo 'CONFIG_LLVM_COV_KERNEL=y > CONFIG_LLVM_COV_PROFILE_ALL=y' >kernel/configs/llvm_cov.config > > $ echo CONFIG_FORTIFY_SOURCE=y >kernel/configs/fortify_source.config > > $ echo CONFIG_AMD_MEM_ENCRYPT=y >arch/x86/configs/amd_mem_encrypt.config > > $ /usr/bin/time -v make -skj"$(nproc)" ARCH=x86_64 LLVM=1 mrproper {def,amd_mem_encrypt.,fortify_source.,llvm_cov.}config bzImage > ... > vmlinux.o: warning: objtool: __sev_es_nmi_complete+0x6e: call to kasan_check_write() leaves .noinstr.text section > vmlinux.o: warning: objtool: do_syscall_64+0x141: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: do_int80_emulation+0x138: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: handle_bug+0x5: call to kmsan_unpoison_entry_regs() leaves .noinstr.text section > vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x105: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: syscall_exit_to_user_mode+0x73: call to user_enter_irqoff() leaves .noinstr.text section > vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x105: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: irqentry_exit_to_user_mode+0x62: call to user_enter_irqoff() leaves .noinstr.text section > vmlinux.o: warning: objtool: irqentry_enter+0x45: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: irqentry_exit+0x4a: call to lockdep_hardirqs_on() leaves .noinstr.text section > vmlinux.o: warning: objtool: irqentry_nmi_enter+0x4: call to lockdep_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: irqentry_nmi_exit+0x67: call to lockdep_on() leaves .noinstr.text section > vmlinux.o: warning: objtool: enter_s2idle_proper+0xb5: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: cpuidle_enter_state+0x113: call to lockdep_hardirqs_off() leaves .noinstr.text section > vmlinux.o: warning: objtool: default_idle_call+0xad: call to lockdep_hardirqs_on() leaves .noinstr.text section > vmlinux.o: warning: objtool: cpu_idle_poll+0x29: call to lockdep_hardirqs_on() leaves .noinstr.text section > vmlinux.o: warning: objtool: acpi_idle_enter_bm+0x118: call to lockdep_hardirqs_on() leaves .noinstr.text section > vmlinux.o: warning: objtool: acpi_idle_do_entry+0x4: call to perf_lopwr_cb() leaves .noinstr.text section > ... > User time (seconds): 670.86 > System time (seconds): 459.05 > Percent of CPU this job got: 169% > Elapsed (wall clock) time (h:mm:ss or m:ss): 11:06.15 > Average shared text size (kbytes): 0 > Average unshared data size (kbytes): 0 > Average stack size (kbytes): 0 > Average total size (kbytes): 0 > Maximum resident set size (kbytes): 38644844 > Average resident set size (kbytes): 0 > Major (requiring I/O) page faults: 18694 > Minor (reclaiming a frame) page faults: 23068856 > Voluntary context switches: 32215431 > Involuntary context switches: 46422 > Swaps: 0 > File system inputs: 0 > File system outputs: 40127696 > Socket messages sent: 0 > Socket messages received: 0 > Signals delivered: 0 > Page size (bytes): 4096 > Exit status: 0 > > $ curl -LSs https://urldefense.com/v3/__https://github.com/ClangBuiltLinux/boot-utils/releases/download/20230707-182910/x86_64-rootfs.cpio.zst__;!!DZ3fjg!7BrjObiTQ7yWOq1feQGQPxe3uzUM5t4pPHkLUuijWyjOwoaX2rdCwZoD4P52pNU_t1tCT2OCWV3GPtNnAw8$ | zstd -d >rootfs.cpio > > $ qemu-system-x86_64 \ > -display none \ > -nodefaults \ > -M q35 \ > -d unimp,guest_errors \ > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ > -kernel arch/x86/boot/bzImage > -initrd rootfs.cpio \ > -cpu host \ > -enable-kvm \ > -m 8G \ > -smp 8 \ > -serial mon:stdio > <hangs with no output> This hang is caused by an early boot exception -- gdb shows the execution reaches the halt loop in early_fixup_exception(). Dumping regs->ip associated with this exception points us to the following instruction: ffffffff89b58074: 48 ff 05 85 7f 4a 76 incq 0x764a7f85(%rip) # 0 <fixed_percpu_data> This is apparently an incorrect access to the per-cpu variable (the cpu offset in %gs is needed) and triggers a null-ptr-deref. Without CONFIG_AMD_MEM_ENCRYPT (one of the bad configs), it turns out the instruction is actually accessing the llvm prof-counter of strscpy(): ffffffff89b85a04: 48 ff 05 6d 94 7d fa incq -0x5826b93(%rip) # ffffffff8435ee78 <__profc__Z13sized_strscpyPcU25pass_dynamic_object_size1PKcU25pass_dynamic_object_size1m> This symbol is left undefined in the bad vmlinux, which explains why the faulting instruction is accessing address 0. Tracing through the kernel linking process shows that the symbol is still defined (as a weak symbol) in vmlinux.a and vmlinux.o, but becomes undefined after the first round of linking of the kernel image (.tmp_vmlinux1). After playing with it a little bit, we found the creation of vmlinux.o to be the problem. Specifically, if we use mold[1] instead of lld to create the object and pass it to the later stages of kernel linking, the symbol will be properly defined as a data symbol (and the kernel can boot). It seems that the issue does not reproduce with LLVM-20. Nevertheless we have reported[2] this to upstream llvm. [1]: https://github.com/rui314/mold [2]: https://github.com/llvm/llvm-project/issues/116575 P.S.: We used mold because gnu ld is simply too slow with all these llvm-cov sections -- the vmlinux.o step ran for 10+ hours and still didn't stop. At the same time, the fact that the creation of vmlinux.o does not use a linker script allows us to directly plug mold in. > > Cheers, > Nathan Best, Jinghao