On Wed, 10 Apr 2024 at 16:30, Pascal Ernster <git@xxxxxxxxxxxxxx> wrote: > > [2024-04-10 12:06] Ard Biesheuvel: > > On Wed, 10 Apr 2024 at 11:03, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > >> > >> On Wed, 10 Apr 2024 at 09:00, Pascal Ernster <git@xxxxxxxxxxxxxx> wrote: > >>> > >>> [2024-04-10 07:34] Borislav Petkov: > >>>> On Tue, Apr 09, 2024 at 06:38:53PM +0200, Pascal Ernster wrote: > >>>>> Just to make sure this doesn't get lost: This patch causes the kernel to not > >>>>> boot on several x86_64 VMs of mine (I haven't tested it on a bare metal > >>>>> machine). For details and a kernel config to reproduce the issue, see https://lore.kernel.org/stable/fd186a2b-0c62-4942-bed3-a27d72930310@xxxxxxxxxxxxxx/ > >>>> > >> > > > > Based on your XML description, I have extracted the command line > > below, to boot a kernel built from the config you provided (but not > > using the arch build scripts). I am using the same x86 initramfs I use > > for all my boot testing, but that shouldn't make a difference here. > > > > Both your 'working' and 'broken' kernels work fine for me, both with > > and without OVMF firmware, so I'm a bit stuck here. Could you please > > try to reproduce using the command line below? > > > > > > /usr/bin/qemu-system-x86_64 -name guest=kernel_issue,debug-threads=on > > -machine pc-q35-8.2,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,hpet=off,acpi=on > > -accel kvm -cpu host,migratable=on -m size=2097152k -object > > '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' > > -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid > > 3ef94585-9ed2-464c-97ca-546fe9b42e2d -display none -no-user-config > > -nodefaults -rtc base=utc,driftfix=slew -global > > kvm-pit.lost_tick_policy=delay -no-shutdown -global > > ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on > > -kernel /usr/local/google/home/ardb/linux-build/arch/x86/boot/bzImage > > -initrd /usr/local/google/home/ardb/rootfs-x86.cpio.gz -append > > 'console=ttyS0,115200 intel_iommu=on lockdown=confidentiality > > ia32_emulation=0 usbcore.nousb loglevel=7 > > earlyprintk=serial,ttyS0,115200' -device > > '{"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"}' > > -device '{"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"}' > > -device '{"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"}' > > -device '{"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"}' > > -device '{"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"}' > > -device '{"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"}' > > -chardev stdio,id=charserial0 -device > > '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' > > -audiodev '{"id":"audio1","driver":"none"}' -global > > ICH9-LPC.noreboot=off -watchdog-action reset -device > > '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.4","addr":"0x0"}' > > -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny > > -msg timestamp=on > > > The error also seems to occur with the /usr/bin/qemu-system-x86_64 > command you posted. I can't see the serial output, but I can see the > persistent 100% CPU load that only occurs with the broken kernel but not > with the kernel where your patch was reverted. > > I've written a shell script that should allow you to reproduce > everything, and I've trimmed down the kernel config (included within the > shell script) even further to reduce compile times. Whilst writing the > script, I've found that the issue seems to only occur when I boot > bzImage, but not when I boot the vmlinux image. > > Regarding the linker used: When building the kernel using my PKGBUILD, I > used mold as linker, but when writing the attached reproducer script, I > used the "normal" ld from the Archlinux binutils 2.42-2 package, and I > can confirm that the issue also does also occur when binutils is used > instead of mold. > > Running the script in tmpfs takes about 10-15 minutes on an Intel i5 > 8500 with sufficient RAM, and it compiles both the "normal" version of > the kernel and a version with your patch reverted. > Thanks, this is very helpful. However, both bzImage-fixed and bzImage-broken boot happily for me. I am using $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.2.0-10' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gc6 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Debian 13.2.0-10) $ ld -v GNU ld (GNU Binutils for Debian) 2.41.90.20240122 $ qemu-system-x86_64 --version QEMU emulator version 8.2.1 (Debian 1:8.2.1+ds-1) Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers You can grab my bzImage here: http://files.workofard.com/bzImage-broken