On Fri, 8 Dec 2023 16:47:23 +0100 Igor Mammedov <imammedo@xxxxxxxxxx> wrote: > On Thu, 7 Dec 2023 17:28:15 -0600 > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > On Wed, Nov 29, 2023 at 04:22:41PM +0100, Fiona Ebner wrote: > > > Hi, > > > it seems that hot-plugging SCSI disks for QEMU virtual machines booting > > > with UEFI and with guest kernels >= 6.5 might be broken. It's not > > > consistently broken, hinting there might be a race somewhere. > > > > > > Reverting the following two commits seems to make it work reliably again: > > > > > > cc22522fd55e2 ("PCI: acpiphp: Use > > > pci_assign_unassigned_bridge_resources() only for non-root bus") > > > 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary" > > > > > > Of course, they might only expose some pre-existing issue, but this is > > > my best lead. See below for some logs and details about an affected > > > virtual machine. Happy to provide more information and to debug/test > > > further. > > > ... > > > > > I've attached some logs for guest using kernel 6.7.0-rc3 where hotplug > > > works rarely and guest using kernel 6.7.0-rc3 with the previously > > > mentioned commits reverted where hotplug works reliably: > > > > > > 6.7.0-rc3: > > > > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000 > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref] > > > > Nov 29 15:12:02 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003) > > > > Nov 29 15:12:02 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11 > > > > Nov 29 15:12:02 hotplug kernel: scsi host3: Virtio SCSI HBA > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff] > > > > Nov 29 15:12:02 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff] > > > > Nov 29 15:12:02 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref] > > > > What's the actual symptom that this is broken? All these log > > fragments show the exact same assignments for BARs 0, 1, 4 and for the > > bridge windows. > > > > I assume 0000:01:02.0 is the hot-added SCSI HBA, and 00:05.0 is a > > bridge leading to it? > > > > Can you put the complete dmesg logs somewhere? There's a lot of > > context missing here. > > > > Do you have to revert both cc22522fd55e2 and 40613da52b13f to make it > > work reliably? If we have to revert something, reverting one would be > > better than reverting both. > > here is simplified reproducer: > ./qemu-system-x86_64 -enable-kvm -m 4G -smp 4 -cpu host \ > /dev/lvmpool/fedora-rawhide \ > -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 \ > -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1 \ > -blockdev raw,file.driver=file,file.filename=Fedora-Server-dvd-x86_64-Rawhide-20231127.n.0.iso,node-name=drive-scsi1 \ > -monitor stdio -serial file:/tmp/console_log > > then once booted at monitor prompt: > > (qemu) device_add virtio-scsi-pci,bus=pci.3,addr=2,id=virtioscsi1 > (qemu) device_add scsi-hd,id=scsi1,drive=drive-scsi1,bus=virtioscsi1.0 > > with distro shipped 6.7.0-0.rc2.20231125git0f5cc96c367f.26.fc40.x86_64 kernel > hotplugged HBA is visible but a disk hotplugged into it is not (like Fiona has reported). > > Problem happens when hotpluged virtio-scsi-pci is the 2nd HBA on the same bridge, > an attempt to rescan HBA (any on the bridge) causes guest hang. > > > However with the same 0f5cc96c367f commit, upstream kernel (without initrd and some minimal config): > > -kernel ./linux-2.6/arch/x86_64/boot/bzImage -append 'root=/dev/sda3 console=ttyS0 console=tty0' > > works as expected (aka disk is visible after hotplug) > > [ 75.636170] pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000 > [ 75.636178] pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f] > [ 75.637193] pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff] > [ 75.638441] pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > [ 75.647035] pci 0000:01:02.0: BAR 4: assigned [mem 0x380800004000-0x380800007fff 64bit pref] > [ 75.649461] pci 0000:01:02.0: BAR 1: assigned [mem 0xfe801000-0xfe801fff] > [ 75.650793] pci 0000:01:02.0: BAR 0: assigned [io 0xc040-0xc07f] > [ 75.652109] pci 0000:00:05.0: PCI bridge to [bus 01] > [ 75.653181] pci 0000:00:05.0: bridge window [io 0xc000-0xcfff] > [ 75.656971] pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff] > [ 75.659970] pci 0000:00:05.0: bridge window [mem 0x380800000000-0x380fffffffff 64bit pref] > [ 75.664990] virtio-pci 0000:01:02.0: enabling device (0000 -> 0003) > [ 75.695505] scsi host3: Virtio SCSI HBA > [ 75.698099] pci 0000:00:05.0: PCI bridge to [bus 01] > [ 75.735840] pci 0000:00:05.0: bridge window [io 0xc000-0xcfff] > [ 75.740361] pci 0000:00:05.0: bridge window [mem 0xfe800000-0xfe9fffff] > [ 75.744088] pci 0000:00:05.0: bridge window [mem 0x380800000000-0x380fffffffff 64bit pref] > > > So, I'm still looking at where/why it goes wrong I've narrowed it down to ACPI PCI hotpug, (i.e. it works fine with SHPC and Fedora's kernel). Though I still can't reproduce with upstream kernel (even with Fedora config (mod=yes)). So far I was testing Seabios variant, next thing to try is UEFI setup (perhaps upstream will fail there) Fiona, Does it help if you use q35 machine with '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off' option? > > > Bjorn > > > > > Reboot > > > > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000 > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref] > > > > Nov 29 15:12:52 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003) > > > > Nov 29 15:12:52 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11 > > > > Nov 29 15:12:52 hotplug kernel: scsi host3: Virtio SCSI HBA > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff] > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff] > > > > Nov 29 15:12:52 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > > Nov 29 15:12:52 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref] > > > > > > RebootThe one time it did work. Note that the line with "QEMU HARDDISK" > > > comes after all lines with "bridge window": > > > > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000 > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref] > > > > Nov 29 15:13:51 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003) > > > > Nov 29 15:13:51 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11 > > > > Nov 29 15:13:51 hotplug kernel: scsi host3: Virtio SCSI HBA > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: PCI bridge to [bus 01] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [io 0xe000-0xefff] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc1400000-0xc15fffff] > > > > Nov 29 15:13:51 hotplug kernel: pci 0000:00:05.0: bridge window [mem 0xc000000000-0xc01fffffff 64bit pref] > > > > Nov 29 15:13:51 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0 > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB) > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08 > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > > > Nov 29 15:13:51 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk > > > > Nov 29 15:14:08 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully. > > > > > > 6.7.0-rc3 with the following reverted: > > > cc22522fd55e2 ("PCI: acpiphp: Use > > > pci_assign_unassigned_bridge_resources() only for non-root bus") > > > 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary") > > > > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: [1af4:1004] type 00 class 0x010000 > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x10: [io 0x0000-0x003f] > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x14: [mem 0x00000000-0x00000fff] > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 4: assigned [mem 0xc000004000-0xc000007fff 64bit pref] > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 1: assigned [mem 0xc1401000-0xc1401fff] > > > > Nov 29 15:15:37 hotplug kernel: pci 0000:01:02.0: BAR 0: assigned [io 0xe040-0xe07f] > > > > Nov 29 15:15:37 hotplug kernel: virtio-pci 0000:01:02.0: enabling device (0000 -> 0003) > > > > Nov 29 15:15:37 hotplug kernel: ACPI: \_SB_.LNKC: Enabled at IRQ 11 > > > > Nov 29 15:15:37 hotplug kernel: scsi host3: Virtio SCSI HBA > > > > Nov 29 15:15:37 hotplug kernel: scsi 3:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Attached scsi generic sg1 type 0 > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: Power-on or device reset occurred > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] 2048 512-byte logical blocks: (1.05 MB/1.00 MiB) > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write Protect is off > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Mode Sense: 63 00 00 08 > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > > > Nov 29 15:15:37 hotplug kernel: sd 3:0:0:1: [sdb] Attached SCSI disk > > > > Nov 29 15:15:38 hotplug systemd[1]: systemd-fsckd.service: Deactivated successfully. > > > > > >