On Tue, 30 May 2023 13:55:12 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > On Tue, May 30, 2023 at 02:16:36PM -0400, Michael S. Tsirkin wrote: > > On Tue, May 30, 2023 at 12:12:44PM -0500, Bjorn Helgaas wrote: > > > On Mon, Apr 24, 2023 at 09:15:57PM +0200, Igor Mammedov wrote: > > > > When using ACPI PCI hotplug, hotplugging a device with > > > > large BARs may fail if bridge windows programmed by > > > > firmware are not large enough. > > > > > > > > Reproducer: > > > > $ qemu-kvm -monitor stdio -M q35 -m 4G \ > > > > -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=on \ > > > > -device id=rp1,pcie-root-port,bus=pcie.0,chassis=4 \ > > > > disk_image > > > > > > > > wait till linux guest boots, then hotplug device > > > > (qemu) device_add qxl,bus=rp1 > > > > > > > > hotplug on guest side fails with: > > > > pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000 > > > > pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff] > > > > pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff] > > > > pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff] > > > > pci 0000:01:00.0: reg 0x1c: [io 0x0000-0x001f] > > > > pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000] > > > > pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000] > > > > pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000] > > > > pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000] > > > > pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff] > > > > pci 0000:01:00.0: BAR 3: assigned [io 0x1000-0x101f] > > > > qxl 0000:01:00.0: enabling device (0000 -> 0003) > > > > > > Ugh, I just noticed that we turned on PCI_COMMAND_MEMORY even though > > > BARs 0 and 1 haven't been assigned. How did that happen? It looks > > > like pci_enable_resources() checks for that, but there must be a hole > > > somewhere. > > > > Maybe because BAR2 was assigned? I think pci_enable_resources just > > does > > if (r->flags & IORESOURCE_MEM) > > cmd |= PCI_COMMAND_MEMORY; > > in a loop so if any memory BARs are assigned then PCI_COMMAND_MEMORY > > is set. > > It does, but it also bails out if it finds IORESOURCE_UNSET: > > pci_enable_resources() > { > ... > pci_dev_for_each_resource(dev, r, i) { > ... > if (r->flags & IORESOURCE_UNSET) { > pci_err(dev, "can't enable device: BAR %d %pR not assigned\n"); > return -EINVAL; > } > ... > if (r->flags & IORESOURCE_MEM) > cmd |= PCI_COMMAND_MEMORY; > } > ... > } > > I expected that IORESOURCE_UNSET would still be there from > pci_assign_resource(), since we saw the "failed to assign" messages, > but there must be more going on. with current acpiphp code pci_assign_resource() isn't called, instead it goes __pci_bus_assign_resources() route. However I an reproduce similar issue with SHPC when using hierarchy deeper than 1 bridge (for which relocation has never worked) qemu-kvm -monitor stdio -M q35 -cpu host -enable-kvm -m 4G \ -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off \ -device pcie-root-port,id=pr1,bus=pcie.0,chassis=4 \ -device pcie-pci-bridge,id=br1,bus=pr1 hotplug device (qemu) device_add qxl,addr=1,bus=br1 shpchp 0000:01:00.0: Latch close on Slot(1) shpchp 0000:01:00.0: Button pressed on Slot(1) shpchp 0000:01:00.0: Card present on Slot(1) shpchp 0000:01:00.0: PCI slot #1 - powering on due to button press pci 0000:02:01.0: [1b36:0100] type 00 class 0x038000 pci 0000:02:01.0: reg 0x10: [mem 0x00000000-0x03ffffff] pci 0000:02:01.0: reg 0x14: [mem 0x00000000-0x03ffffff] pci 0000:02:01.0: reg 0x18: [mem 0x00000000-0x00001fff] pci 0000:02:01.0: reg 0x1c: [io 0x0000-0x001f] pci 0000:02:01.0: BAR 0: no space for [mem size 0x04000000] pci 0000:02:01.0: BAR 0: failed to assign [mem size 0x04000000] pci 0000:02:01.0: BAR 1: no space for [mem size 0x04000000] pci 0000:02:01.0: BAR 1: failed to assign [mem size 0x04000000] pci 0000:02:01.0: BAR 2: assigned [mem 0xfe600000-0xfe601fff] pci 0000:02:01.0: BAR 3: assigned [io 0xc000-0xc01f] ^^^^^^^^^^^ shpchp 0000:01:00.0: PCI bridge to [bus 02] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff] shpchp 0000:01:00.0: bridge window [mem 0xfe600000-0xfe7fffff] shpchp 0000:01:00.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref] PCI: No. 2 try to assign unassigned res release child resource [mem 0xfe600000-0xfe601fff] shpchp 0000:01:00.0: resource 8 [mem 0xfe600000-0xfe7fffff] released shpchp 0000:01:00.0: PCI bridge to [bus 02] shpchp 0000:01:00.0: BAR 8: no space for [mem size 0x0a000000] shpchp 0000:01:00.0: BAR 8: failed to assign [mem size 0x0a000000] pci 0000:02:01.0: BAR 0: no space for [mem size 0x04000000] pci 0000:02:01.0: BAR 0: failed to assign [mem size 0x04000000] pci 0000:02:01.0: BAR 1: no space for [mem size 0x04000000] pci 0000:02:01.0: BAR 1: failed to assign [mem size 0x04000000] pci 0000:02:01.0: BAR 2: no space for [mem size 0x00002000] pci 0000:02:01.0: BAR 2: failed to assign [mem size 0x00002000] shpchp 0000:01:00.0: PCI bridge to [bus 02] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff] shpchp 0000:01:00.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref] qxl 0000:02:01.0: enabling device (0000 -> 0001) ^^^ IO res only where: assign_requested_resources_sorted() ... if (pci_assign_resource()) reset_resource(res); reset wipes everything pci_assign_resource() has done for failing resources leaving only assigned IO (on the 1st pass). Then later pci_enable_device_flags() will build mask for available bars for (i = 0; i <= PCI_ROM_RESOURCE; i++) if (dev->resource[i].flags & flags) bars |= (1 << i); for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) if (dev->resource[i].flags & flags) bars |= (1 << i); however since for failed MEM resources reset cleared flags along with everything else, above snippet will ignore MEM bars, leaving only assigned IO resource. Then do_pci_enable_device() -> pcibios_enable_device() -> pci_enable_resources(,bars) will happily succeed since mask tells it only to look into IO. And well even if mask weren't excluding MEM ones, it won't help since the resource was cleared out in assign_requested_resources_sorted(). Perhaps instead of playing with flags, we should somehow mark device with unusable resources as disabled, and fail pci_enable_device() early. (and also make sure its resources aren't accounted anymore on follow up hotplug events to the bridge)