Hi, On Mon, Jan 09, 2023 at 02:33:35PM -0500, Alexander Motin wrote: > On 04.01.2023 04:16, Mika Westerberg wrote: > > A PCI bridge may reside on a bus with other devices as well. The > > resource distribution code does not take this into account properly and > > therefore it expands the bridge resource windows too much, not leaving > > space for the other devices (or functions of a multifunction device) and > > this leads to an issue that Jonathan reported. He runs QEMU with the > > following topology (QEMU parameters): > > > > -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2 \ > > -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \ > > -device e1000,bus=root_port13,addr=0.1 \ > > -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3 \ > > -device e1000,bus=fun1 > > > > The first e1000 NIC here is another function in the switch upstream > > port. This leads to following errors: > > > > pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04] > > pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04] > > pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000] > > e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0] > > > > Fix this by taking into account the device BARs on the bus, including > > the ones belonging to bridges themselves. > > > > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@xxxxxxxxxx/ > > Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@xxxxxxxxxxxxx/ > > Reported-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > Reported-by: Alexander Motin <mav@xxxxxxxxxxxxx> > > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > > --- > > drivers/pci/setup-bus.c | 205 +++++++++++++++++++++++++--------------- > > 1 file changed, 129 insertions(+), 76 deletions(-) > > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > > index b4096598dbcb..cf6a7bdf2427 100644 > > --- a/drivers/pci/setup-bus.c > > +++ b/drivers/pci/setup-bus.c > > @@ -1750,6 +1750,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res, > > resource_size_t new_size) > > { > > resource_size_t add_size, size = resource_size(res); > > + resource_size_t min_size; > > if (res->parent) > > return; > > @@ -1757,30 +1758,87 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res, > > if (!new_size) > > return; > > + /* Minimum granularity of a bridge bridge window */ > > + min_size = window_alignment(bridge->bus, res->flags); > > + > > if (new_size > size) { > > add_size = new_size - size; > > + if (add_size < min_size) > > + return; > > pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, > > &add_size); > > } else if (new_size < size) { > > add_size = size - new_size; > > + if (add_size < min_size) > > + return; > > May be I don't understand something, but in what situation it may happen, > and won't it be a problem if you silently do nothing here, while the calling > code will use the passed new_size as-is? Good point. I saw this happening when I run QEMU but now that I started to look at it again the reason seems to be that there is a 32-bit prefetchable option ROM BAR for the e1000 NIC and that gets reduced incorrectly by the reduce_dev_resources() which causes the "imbalance". > > pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res, > > &add_size); > > + } else { > > + return; > > } > > res->end = res->start + new_size - 1; > > remove_from_list(add_list, res); > > } > > +static void reduce_dev_resources(struct pci_dev *dev, struct resource *io, > > + struct resource *mmio, > > + struct resource *mmio_pref) > > +{ > > + int i; > > + > > + for (i = 0; i < PCI_NUM_RESOURCES; i++) { > > + struct resource *res = &dev->resource[i]; > > + resource_size_t align, tmp, size; > > + > > + size = resource_size(res); > > + if (!size) > > + continue; > > + > > + align = pci_resource_alignment(dev, res); > > + > > + if (resource_type(res) == IORESOURCE_IO) { > > + align = align ? ALIGN(io->start, align) - io->start : 0; > > + tmp = align + size; > > + io->start = min(io->start + tmp, io->end + 1); > > + } else if (resource_type(res) == IORESOURCE_MEM) { > > + if (res->flags & IORESOURCE_PREFETCH) { > > + align = align ? ALIGN(mmio_pref->start, align) - > > + mmio_pref->start : 0; > > + tmp = align + size; > > + mmio_pref->start = min(mmio_pref->start + tmp, > > + mmio_pref->end + 1); > > + } else { > > + align = align ? ALIGN(mmio->start, align) - > > + mmio->start : 0; > > + tmp = align + size; > > + mmio->start = min(mmio->start + tmp, > > + mmio->end + 1); > > + } > > + } > > + } > > +} > > + > > +/* > > + * io, mmio and mmio_pref contain the total amount of bridge window > > + * space available. This includes the minimal space needed to cover all > > + * the existing devices on the bus and the possible extra space that can > > + * be shared with the bridges. > > + * > > + * The resource space consumed by bus->self (the bridge) is already > > + * reduced. > > + */ > > static void pci_bus_distribute_available_resources(struct pci_bus *bus, > > struct list_head *add_list, > > struct resource io, > > struct resource mmio, > > struct resource mmio_pref) > > { > > + resource_size_t io_align, mmio_align, mmio_pref_align; > > + resource_size_t io_per_b, mmio_per_b, mmio_pref_per_b; > > unsigned int normal_bridges = 0, hotplug_bridges = 0; > > struct resource *io_res, *mmio_res, *mmio_pref_res; > > struct pci_dev *dev, *bridge = bus->self; > > - resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align; > > io_res = &bridge->resource[PCI_BRIDGE_IO_WINDOW]; > > mmio_res = &bridge->resource[PCI_BRIDGE_MEM_WINDOW]; > > @@ -1790,17 +1848,17 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > > * The alignment of this bridge is yet to be considered, hence it must > > * be done now before extending its bridge window. > > */ > > - align = pci_resource_alignment(bridge, io_res); > > - if (!io_res->parent && align) > > - io.start = min(ALIGN(io.start, align), io.end + 1); > > + io_align = pci_resource_alignment(bridge, io_res); > > + if (!io_res->parent && io_align) > > + io.start = min(ALIGN(io.start, io_align), io.end + 1); > > - align = pci_resource_alignment(bridge, mmio_res); > > - if (!mmio_res->parent && align) > > - mmio.start = min(ALIGN(mmio.start, align), mmio.end + 1); > > + mmio_align = pci_resource_alignment(bridge, mmio_res); > > + if (!mmio_res->parent && mmio_align) > > + mmio.start = min(ALIGN(mmio.start, mmio_align), mmio.end + 1); > > - align = pci_resource_alignment(bridge, mmio_pref_res); > > - if (!mmio_pref_res->parent && align) > > - mmio_pref.start = min(ALIGN(mmio_pref.start, align), > > + mmio_pref_align = pci_resource_alignment(bridge, mmio_pref_res); > > + if (!mmio_pref_res->parent && mmio_pref_align) > > + mmio_pref.start = min(ALIGN(mmio_pref.start, mmio_pref_align), > > mmio_pref.end + 1); > > /* > > @@ -1824,94 +1882,89 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > > normal_bridges++; > > } > > - /* > > - * There is only one bridge on the bus so it gets all available > > - * resources which it can then distribute to the possible hotplug > > - * bridges below. > > - */ > > - if (hotplug_bridges + normal_bridges == 1) { > > - dev = list_first_entry(&bus->devices, struct pci_dev, bus_list); > > - if (dev->subordinate) > > - pci_bus_distribute_available_resources(dev->subordinate, > > - add_list, io, mmio, mmio_pref); > > - return; > > - } > > - > > - if (hotplug_bridges == 0) > > + if (!(hotplug_bridges + normal_bridges)) > > return; > > /* > > * Calculate the total amount of extra resource space we can > > - * pass to bridges below this one. This is basically the > > - * extra space reduced by the minimal required space for the > > - * non-hotplug bridges. > > + * pass to bridges below this one. This is basically the extra > > + * space reduced by the minimal required space for the bridge > > + * windows and device BARs on this bus. > > */ > > - for_each_pci_bridge(dev, bus) { > > - resource_size_t used_size; > > - struct resource *res; > > - > > - if (dev->is_hotplug_bridge) > > - continue; > > - > > - /* > > - * Reduce the available resource space by what the > > - * bridge and devices below it occupy. > > - */ > > - res = &dev->resource[PCI_BRIDGE_IO_WINDOW]; > > - align = pci_resource_alignment(dev, res); > > - align = align ? ALIGN(io.start, align) - io.start : 0; > > - used_size = align + resource_size(res); > > - if (!res->parent) > > - io.start = min(io.start + used_size, io.end + 1); > > - > > - res = &dev->resource[PCI_BRIDGE_MEM_WINDOW]; > > - align = pci_resource_alignment(dev, res); > > - align = align ? ALIGN(mmio.start, align) - mmio.start : 0; > > - used_size = align + resource_size(res); > > - if (!res->parent) > > - mmio.start = min(mmio.start + used_size, mmio.end + 1); > > + list_for_each_entry(dev, &bus->devices, bus_list) > > + reduce_dev_resources(dev, &io, &mmio, &mmio_pref); > > - res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW]; > > - align = pci_resource_alignment(dev, res); > > - align = align ? ALIGN(mmio_pref.start, align) - > > - mmio_pref.start : 0; > > - used_size = align + resource_size(res); > > - if (!res->parent) > > - mmio_pref.start = min(mmio_pref.start + used_size, > > - mmio_pref.end + 1); > > + /* > > + * If there is at least one hotplug bridge on this bus it gets > > + * all the extra resource space that was left after the > > + * reductions above. > > + * > > + * If there are no hotplug bridges the extra resource space is > > + * split between non-hotplug bridges. This is to allow possible > > + * hotplug bridges below them to get the extra space as well. > > + */ > > + if (hotplug_bridges) { > > + io_per_b = div64_ul(resource_size(&io), hotplug_bridges); > > + mmio_per_b = div64_ul(resource_size(&mmio), hotplug_bridges); > > + mmio_pref_per_b = div64_ul(resource_size(&mmio_pref), > > + hotplug_bridges); > > + } else { > > + io_per_b = div64_ul(resource_size(&io), normal_bridges); > > + mmio_per_b = div64_ul(resource_size(&mmio), normal_bridges); > > + mmio_pref_per_b = div64_ul(resource_size(&mmio_pref), > > + normal_bridges); > > } > > - io_per_hp = div64_ul(resource_size(&io), hotplug_bridges); > > - mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges); > > - mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref), > > - hotplug_bridges); > > - > > /* > > - * Go over devices on this bus and distribute the remaining > > - * resource space between hotplug bridges. > > + * Make sure the split resource space is properly aligned for > > + * bridge windows (align it down to avoid going above what is > > + * available). > > */ > > + if (io_align) > > + io_per_b = ALIGN_DOWN(io_per_b, io_align); > > + if (mmio_align) > > + mmio_per_b = ALIGN_DOWN(mmio_per_b, mmio_align); > > + if (mmio_pref_align) > > + mmio_pref_per_b = ALIGN_DOWN(mmio_pref_per_b, mmio_align); > > If I understand it right, you are applying alignment requirements of the > parent bridge to the windows of its children. I don't have examples of any > bridge with different alignment, but shouldn't we better get and use proper > alignment inside the loop below? Yes, I agree.