[+cc Fiona, Thomas] On Mon, Dec 11, 2023 at 04:14:25PM -0800, Dongli Zhang wrote: > This is to report a regression caused by commit 40613da52b13 ("PCI: acpiphp: > Reassign resources on bridge if necessary"). > > PCI: acpiphp: Reassign resources on bridge if necessary > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=40613da52b13fb21c5566f10b287e0ca8c12c4e9 Thanks very much for the report and the detailed analysis! Just for completeness, I'm cc'ing folks from another report that appears to be the same or very similar issue: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@xxxxxxxxxxx There's also a bugzilla with some logs attached: https://bugzilla.kernel.org/show_bug.cgi?id=218255 #regzbot dup: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@xxxxxxxxxxx > That patch may reconfigure the mmio resource of the bridge so that to write to > mmio during PCI hotplug may lose effect. > > Here is how to reproduce the issue. > > 1. Create QEMU (v8.1.0) VM with the below. > > qemu-system-x86_64 -machine pc,accel=kvm \ > -smp 8 -m 8G -cpu host -hda uefi-os.qcow2 \ > -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly \ > -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd \ > -kernel mainline-linux/arch/x86_64/boot/bzImage \ > -append "root=/dev/sda2 init=/sbin/init text console=ttyS0 loglevel=7" \ > -serial none -display none -vnc :9 -monitor stdio \ > -net nic -net user,hostfwd=tcp::5027-:22 \ > -device pci-bridge,id=bridge1,bus=pci.0,chassis_nr=2 > > We should have CONFIG_SCSI_SCAN_ASYNC=y so that scsi probing happens in another > thread. > > 2. Hot-add two vhost-scsi PCI devices consecutively. > > (qemu) device_add vhost-scsi-pci,wwpn=naa.5001405324af0985,id=vhost01,bus=bridge1,addr=8 > (qemu) device_add vhost-scsi-pci,wwpn=naa.5001405324af0986,id=vhost02,bus=bridge1,addr=0 > > 3. The 1st hot-add succeeds. > > [ 44.365111] ACPI: \_SB_.PCI0.S20_.S40_: Device check in hotplug_event() > [ 44.365261] pci 0000:01:08.0: [1af4:1004] type 00 class 0x010000 > [ 44.365377] pci 0000:01:08.0: reg 0x10: [io 0x0000-0x003f] > [ 44.365425] pci 0000:01:08.0: reg 0x14: [mem 0x00000000-0x00000fff] > [ 44.365589] pci 0000:01:08.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > [ 44.366479] pci 0000:01:08.0: EDR: Notify handler installed > [ 44.367255] pci 0000:01:08.0: vgaarb: pci_notify > [ 44.367490] pci 0000:00:04.0: BAR 15: assigned [mem 0x800100000-0x8001fffff 64bit pref] > [ 44.367497] pci 0000:01:08.0: BAR 4: assigned [mem 0x800100000-0x800103fff 64bit pref] > [ 44.367579] pci 0000:01:08.0: BAR 1: assigned [mem 0xc1000000-0xc1000fff] > [ 44.367605] pci 0000:01:08.0: BAR 0: assigned [io 0xc000-0xc03f] > [ 44.367638] pci 0000:00:04.0: PCI bridge to [bus 01] > [ 44.367653] pci 0000:00:04.0: bridge window [io 0xc000-0xcfff] > [ 44.369258] pci 0000:00:04.0: bridge window [mem 0xc1000000-0xc11fffff] > [ 44.370134] pci 0000:00:04.0: bridge window [mem 0x800100000-0x8001fffff 64bit pref] > [ 44.372147] virtio-pci 0000:01:08.0: vgaarb: pci_notify > [ 44.372155] virtio-pci 0000:01:08.0: runtime IRQ mapping not provided by arch > [ 44.409717] ACPI: \_SB_.LNKD: Enabled at IRQ 10 > [ 44.409781] virtio-pci 0000:01:08.0: enabling device (0000 -> 0003) > [ 44.448742] virtio-pci 0000:01:08.0: enabling bus mastering > [ 44.452982] scsi host2: Virtio SCSI HBA > [ 44.470553] virtio-pci 0000:01:08.0: vgaarb: pci_notify > [ 44.471124] scsi 2:0:1:0: Direct-Access LIO-ORG lun1 4.0 PQ: 0 ANSI: 5 > [ 44.490102] sd 2:0:1:0: Attached scsi generic sg2 type 0 > [ 44.490240] sd 2:0:1:0: [sdb] 262144 512-byte logical blocks: (134 MB/128 MiB) > [ 44.490275] sd 2:0:1:0: [sdb] Write Protect is off > [ 44.490279] sd 2:0:1:0: [sdb] Mode Sense: 43 00 10 08 > [ 44.490333] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > [ 44.501807] sd 2:0:1:0: [sdb] Preferred minimum I/O size 512 bytes > [ 44.501812] sd 2:0:1:0: [sdb] Optimal transfer size 8388608 bytes > [ 44.504009] sd 2:0:1:0: [sdb] Attached SCSI disk > > 4. The 2nd hot-add fails. We do not see scsi probing finished. > > [ 52.285341] ACPI: \_SB_.PCI0.S20_.S00_: Device check in hotplug_event() > [ 52.285488] pci 0000:01:00.0: [1af4:1004] type 00 class 0x010000 > [ 52.285620] pci 0000:01:00.0: reg 0x10: [io 0x0000-0x003f] > [ 52.285668] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x00000fff] > [ 52.285833] pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] > [ 52.286687] pci 0000:01:00.0: EDR: Notify handler installed > [ 52.287484] pci 0000:01:00.0: vgaarb: pci_notify > [ 52.287568] pci 0000:01:00.0: BAR 4: assigned [mem 0x800104000-0x800107fff 64bit pref] > [ 52.287650] pci 0000:01:00.0: BAR 1: assigned [mem 0xc1001000-0xc1001fff] > [ 52.287676] pci 0000:01:00.0: BAR 0: assigned [io 0xc040-0xc07f] > [ 52.287701] pci 0000:00:04.0: PCI bridge to [bus 01] > [ 52.287715] pci 0000:00:04.0: bridge window [io 0xc000-0xcfff] > [ 52.289611] pci 0000:00:04.0: bridge window [mem 0xc1000000-0xc11fffff] > [ 52.291062] pci 0000:00:04.0: bridge window [mem 0x800100000-0x8001fffff 64bit pref] > [ 52.294057] virtio-pci 0000:01:00.0: vgaarb: pci_notify > [ 52.294065] virtio-pci 0000:01:00.0: runtime IRQ mapping not provided by arch > [ 52.294083] virtio-pci 0000:01:00.0: enabling device (0000 -> 0003) > [ 52.335305] virtio-pci 0000:01:00.0: enabling bus mastering > [ 52.339810] scsi host3: Virtio SCSI HBA > [ 52.358165] virtio-pci 0000:01:00.0: vgaarb: pci_notify > [ 52.358380] pci 0000:00:04.0: PCI bridge to [bus 01] > [ 52.358398] pci 0000:00:04.0: bridge window [io 0xc000-0xcfff] > [ 52.358518] scsi 3:0:1:0: Direct-Access LIO-ORG lun2 4.0 PQ: 0 ANSI: 5 > [ 52.360287] pci 0000:00:04.0: bridge window [mem 0xc1000000-0xc11fffff] > [ 52.362313] pci 0000:00:04.0: bridge window [mem 0x800100000-0x8001fffff 64bit pref] > > It is because of the following. To reconfigure the mmio resource of bridge and > to probe scsi happens at the same time, in different threads. > > [ 52.358398] pci 0000:00:04.0: bridge window [io 0xc000-0xcfff] > [ 52.358518] scsi 3:0:1:0: Direct-Access LIO-ORG lun2 4.0 PQ: 0 ANSI: 5 > [ 52.360287] pci 0000:00:04.0: bridge window [mem 0xc1000000-0xc11fffff] > [ 52.362313] pci 0000:00:04.0: bridge window [mem 0x800100000-0x8001fffff 64bit pref] > > The race starts since line 638. > > 627 static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge) > 628 { > 629 struct resource *res; > 630 struct pci_bus_region region; > 631 u32 l, bu, lu; > 632 > 633 /* > 634 * Clear out the upper 32 bits of PREF limit. If > 635 * PCI_PREF_BASE_UPPER32 was non-zero, this temporarily disables > 636 * PREF range, which is ok. > 637 */ > 638 pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, 0); > 639 > 640 /* Set up PREF base/limit */ > 641 bu = lu = 0; > 642 res = &bridge->resource[PCI_BRIDGE_PREF_MEM_WINDOW]; > 643 pcibios_resource_to_bus(bridge->bus, ®ion, res); > 644 if (res->flags & IORESOURCE_PREFETCH) { > 645 l = (region.start >> 16) & 0xfff0; > 646 l |= region.end & 0xfff00000; > 647 if (res->flags & IORESOURCE_MEM_64) { > 648 bu = upper_32_bits(region.start); > 649 lu = upper_32_bits(region.end); > 650 } > 651 pci_info(bridge, " bridge window %pR\n", res); > 652 } else { > 653 l = 0x0000fff0; > 654 } > 655 pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l); > 656 > 657 /* Set the upper 32 bits of PREF base & limit */ > 658 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, bu); > 659 pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, lu); > 660 } > > > To probe the vhost-scsi and to reconfigure bridge window happen in different > threads. There is a race between them so that to write to mmio may not work. It > is called ioeventfd from KVM/QEMU's perspective. > > [0] pci_setup_bridge_mmio_pref > [0] pci_setup_bridge > [0] pci_assign_unassigned_bridge_resources > [0] enable_slot > [0] acpiphp_check_bridge > [0] acpiphp_hotplug_notify > [0] acpi_device_hotplug > [0] acpi_hotplug_work_fn > [0] process_one_work > [0] worker_thread > [0] kthread > [0] ret_from_fork > [0] ret_from_fork_asm > > [0] blk_execute_rq > [0] scsi_execute_cmd > [0] scsi_probe_lun > [0] scsi_probe_and_add_lun > [0] __scsi_scan_target > [0] scsi_scan_channel > [0] scsi_scan_host_selected > [0] do_scsi_scan_host > [0] do_scan_async > [0] async_run_entry_fn > [0] process_one_work > [0] worker_thread > [0] kthread > [0] ret_from_fork > > > Indeed, I am curious if this impacts other PCI devices on this PCI bridge, > doing mmio (e.g., to kick the doorbell register during that time). > > There used to be similar issue. According to prior commits, we may need to > touch the bridge window for only once. > > PCI: Probe bridge window attributes once at enumeration-time > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521 > > > Thank you very much! > > Dongli Zhang