[2024-04-10 12:06] Ard Biesheuvel:
On Wed, 10 Apr 2024 at 11:03, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:On Wed, 10 Apr 2024 at 09:00, Pascal Ernster <git@xxxxxxxxxxxxxx> wrote:[2024-04-10 07:34] Borislav Petkov:On Tue, Apr 09, 2024 at 06:38:53PM +0200, Pascal Ernster wrote:Just to make sure this doesn't get lost: This patch causes the kernel to not boot on several x86_64 VMs of mine (I haven't tested it on a bare metal machine). For details and a kernel config to reproduce the issue, see https://lore.kernel.org/stable/fd186a2b-0c62-4942-bed3-a27d72930310@xxxxxxxxxxxxxx/Based on your XML description, I have extracted the command line below, to boot a kernel built from the config you provided (but not using the arch build scripts). I am using the same x86 initramfs I use for all my boot testing, but that shouldn't make a difference here. Both your 'working' and 'broken' kernels work fine for me, both with and without OVMF firmware, so I'm a bit stuck here. Could you please try to reproduce using the command line below? /usr/bin/qemu-system-x86_64 -name guest=kernel_issue,debug-threads=on -machine pc-q35-8.2,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,hpet=off,acpi=on -accel kvm -cpu host,migratable=on -m size=2097152k -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 3ef94585-9ed2-464c-97ca-546fe9b42e2d -display none -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -kernel /usr/local/google/home/ardb/linux-build/arch/x86/boot/bzImage -initrd /usr/local/google/home/ardb/rootfs-x86.cpio.gz -append 'console=ttyS0,115200 intel_iommu=on lockdown=confidentiality ia32_emulation=0 usbcore.nousb loglevel=7 earlyprintk=serial,ttyS0,115200' -device '{"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"}' -device '{"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"}' -device '{"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"}' -device '{"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"}' -device '{"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"}' -device '{"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"}' -chardev stdio,id=charserial0 -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' -audiodev '{"id":"audio1","driver":"none"}' -global ICH9-LPC.noreboot=off -watchdog-action reset -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.4","addr":"0x0"}' -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
The error also seems to occur with the /usr/bin/qemu-system-x86_64 command you posted. I can't see the serial output, but I can see the persistent 100% CPU load that only occurs with the broken kernel but not with the kernel where your patch was reverted.
I've written a shell script that should allow you to reproduce everything, and I've trimmed down the kernel config (included within the shell script) even further to reduce compile times. Whilst writing the script, I've found that the issue seems to only occur when I boot bzImage, but not when I boot the vmlinux image.
Regarding the linker used: When building the kernel using my PKGBUILD, I used mold as linker, but when writing the attached reproducer script, I used the "normal" ld from the Archlinux binutils 2.42-2 package, and I can confirm that the issue also does also occur when binutils is used instead of mold.
Running the script in tmpfs takes about 10-15 minutes on an Intel i5 8500 with sufficient RAM, and it compiles both the "normal" version of the kernel and a version with your patch reverted.
Regards Pascal
Attachment:
reproduce.sh
Description: application/shellscript