Hi Mika, On Wed, Nov 30, 2022 at 01:22:20PM +0200, Mika Westerberg wrote: > A PCI bridge may reside on a bus with other devices as well. The > resource distribution code does not take this into account properly and > therefore it expands the bridge resource windows too much, not leaving > space for the other devices (or functions a multifunction device) and functions *of* a > this leads to an issue that Jonathan reported. He runs QEMU with the > following topoology (QEMU parameters): topology > -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2 \ > -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \ > -device e1000,bus=root_port13,addr=0.1 \ > -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3 \ > -device e1000,bus=fun1 If you use spaces instead of tabs above, the "\" will stay lined up when git log indents. > The first e1000 NIC here is another function in the switch upstream > port. This leads to following errors: > > pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04] > pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04] > pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000] > e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0] > > Fix this by taking into account the possible multifunction devices when > uptream port resources are distributed. "upstream", although I think I would word this so it's less PCIe-centric. IIUC, we just want to account for all the BARs on the bus, whether they're in bridges, peers in a multi-function device, or other devices. > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@xxxxxxxxxx/ > Reported-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > --- > drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 62 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > index b4096598dbcb..d456175ddc4f 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > * bridges below. > */ > if (hotplug_bridges + normal_bridges == 1) { > - dev = list_first_entry(&bus->devices, struct pci_dev, bus_list); > - if (dev->subordinate) > - pci_bus_distribute_available_resources(dev->subordinate, > - add_list, io, mmio, mmio_pref); > + bridge = NULL; > + > + /* Find the single bridge on this bus first */ > + for_each_pci_bridge(dev, bus) { > + bridge = dev; > + break; > + } If we just remember "bridge" in the loop before this hunk, could we get rid of the loop here? E.g., bridge = NULL; for_each_pci_bridge(dev, bus) { bridge = dev; if (dev->is_hotplug_bridge) hotplug_bridges++; else normal_bridges++; } > + > + if (WARN_ON_ONCE(!bridge)) > + return; Then I think this would be superfluous. > + if (!bridge->subordinate) > + return; > + > + /* > + * Reduce the space available for distribution by the > + * amount required by the other devices on the same bus > + * as this bridge. > + */ > + list_for_each_entry(dev, &bus->devices, bus_list) { > + int i; > + > + if (dev == bridge) > + continue; Why do we skip "bridge"? Bridges are allowed to have two BARs themselves, and it seems like they should be included here. > + for (i = 0; i < PCI_NUM_RESOURCES; i++) { > + const struct resource *dev_res = &dev->resource[i]; > + resource_size_t dev_sz; > + struct resource *b_res; > + > + if (dev_res->flags & IORESOURCE_IO) { > + b_res = &io; > + } else if (dev_res->flags & IORESOURCE_MEM) { > + if (dev_res->flags & IORESOURCE_PREFETCH) > + b_res = &mmio_pref; > + else > + b_res = &mmio; > + } else { > + continue; > + } > + > + /* Size aligned to bridge window */ > + align = pci_resource_alignment(bridge, b_res); > + dev_sz = ALIGN(resource_size(dev_res), align); > + if (!dev_sz) > + continue; > + > + pci_dbg(dev, "resource %pR aligned to %#llx\n", > + dev_res, (unsigned long long)dev_sz); > + > + if (dev_sz > resource_size(b_res)) > + memset(b_res, 0, sizeof(*b_res)); > + else > + b_res->end -= dev_sz; > + > + pci_dbg(bridge, "updated available resources to %pR\n", > + b_res); > + } > + } This only happens for buses with a single bridge. Shouldn't it happen regardless of how many bridges there are? This block feels like something that could be split out to a separate function. It looks like it only needs "bus", "io", "mmio", "mmio_pref", and maybe "bridge". I don't understand the "bridge" part; it looks like that's basically to use 4K alignment for I/O windows and 1M for memory windows? Using "bridge" seems like a clunky way to figure that out. Bjorn