On Fri, 7 Jun 2024 17:33:20 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > On Tue, May 07, 2024 at 03:31:23PM -0600, Alex Williamson wrote: > > Resizing BARs can be blocked when a device in the bridge hierarchy > > itself consumes resources from the resized range. This scenario is > > common with Intel Arc DG2 GPUs where the following is a typical > > topology: > > > > +-[0000:5d]-+-00.0-[5e-61]----00.0-[5f-61]--+-01.0-[60]----00.0 Intel Corporation DG2 [Arc A380] > > \-04.0-[61]----00.0 Intel Corporation DG2 Audio Controller > > > > Here the system BIOS has provided a large 64bit, prefetchable window: > > > > pci_bus 0000:5d: root bus resource [mem 0xb000000000-0xbfffffffff window] > > > > But only a small portion is programmed into the root port aperture: > > > > pci 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref] > > > > The upstream port then provides the following aperture: > > > > pci 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] > > > > With the missing range found to be consumed by the switch port itself: > > > > pci 0000:5e:00.0: BAR 0 [mem 0xbff0000000-0xbff07fffff 64bit pref] > > > > The downstream port above the GPU provides the same aperture as upstream: > > > > pci 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] > > > > Which is entirely consumed by the GPU: > > > > pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref] > > > > In summary, iomem reports the following: > > > > b000000000-bfffffffff : PCI Bus 0000:5d > > bfe0000000-bff07fffff : PCI Bus 0000:5e > > bfe0000000-bfefffffff : PCI Bus 0000:5f > > bfe0000000-bfefffffff : PCI Bus 0000:60 > > bfe0000000-bfefffffff : 0000:60:00.0 > > bff0000000-bff07fffff : 0000:5e:00.0 > > > > The GPU at 0000:60:00.0 supports a Resizable BAR: > > > > Capabilities: [420 v1] Physical Resizable BAR > > BAR 2: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB 8GB > > > > However when attempting a resize we get -ENOSPC: > > > > pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing > > pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing > > pcieport 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing > > pcieport 0000:5e:00.0: bridge window [mem size 0x200000000 64bit pref]: can't assign; no space > > pcieport 0000:5e:00.0: bridge window [mem size 0x200000000 64bit pref]: failed to assign > > pcieport 0000:5f:01.0: bridge window [mem size 0x200000000 64bit pref]: can't assign; no space > > pcieport 0000:5f:01.0: bridge window [mem size 0x200000000 64bit pref]: failed to assign > > pci 0000:60:00.0: BAR 2 [mem size 0x200000000 64bit pref]: can't assign; no space > > pci 0000:60:00.0: BAR 2 [mem size 0x200000000 64bit pref]: failed to assign > > pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61] > > pcieport 0000:5d:00.0: bridge window [mem 0xb9000000-0xba0fffff] > > pcieport 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref] > > pcieport 0000:5e:00.0: PCI bridge to [bus 5f-61] > > pcieport 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] > > pcieport 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] > > pcieport 0000:5f:01.0: PCI bridge to [bus 60] > > pcieport 0000:5f:01.0: bridge window [mem 0xb9000000-0xb9ffffff] > > pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref] > > pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: assigned > > > > In this example we need to resize all the way up to the root port > > aperture, but we refuse to change the root port aperture while resources > > are allocated for the upstream port BAR. > > > > The solution proposed here builds on the idea in commit 91fa127794ac > > ("PCI: Expose PCIe Resizable BAR support via sysfs") where the BAR can > > be resized while there is no driver attached. In this case, when there > > is no driver bound to the upstream switch port we'll release resources > > of the bridge which match the reallocation. Therefore we can achieve > > the below successful resize operation by unbinding 0000:5e:00.0 from the > > pcieport driver before invoking the resource2_resize interface on the > > GPU at 0000:60:00.0. > > resource2_resize? Oh, I guess this is the sysfs interface > (resourceN_resize, which leads to pci_resize_resource(), and in this > case we're resizing BAR 2 to 8GB, Exactly. > so it must have been something like > this? (2 ^ (13+20) == 8G) > > # echo 0000:5f:01.0 > /sys/bus/pci/drivers/pcieport/unbind > # echo 0000:5e:00.0 > /sys/bus/pci/drivers/pcieport/unbind > # echo 0000:5d:00.0 > /sys/bus/pci/drivers/pcieport/unbind > # echo 13 > /sys/bus/pci/devices/0000:60:00.0/resource2_resize > > (Maybe we don't need 5d:00.0, since that looks like a Root Port and > doesn't have a BAR that needs to be released?) We don't actually need to unbind either the root port (5d:00.0) or the downstream switch port (5f:01.0) since they don't consume any resources from the aperture we need to resize. For example if this were an AMD GPU we'd have a similar PCIe switch topology but the upstream switch does not expose a 64-bit prefetchable BAR, only the GPU endpoint itself consumes resources from that aperture. Therefore we'd only need to unbind the endpoint driver, effect the resize, and rebind the endpoint driver. Ex (assuming the same topology): # echo 0000:60:00.0 > /sys/bus/pci/devices/0000:60:00.0/driver/unbind # echo 13 > /sys/bus/pci/devices/0000:60:00.0/resource2_resize # echo 0000:60:00.0 > /sys/bus/pci/drivers_probe The Intel GPU has made the unfortunate hardware decision to have the upstream port consume resources from the same aperture as used by the downstream resizable BAR, therefore the above steps fail with the -ENOSPC error for an Intel Arc GPU. This proposal allows it to work as: # echo 0000:60:00.0 > /sys/bus/pci/devices/0000:60:00.0/driver/unbind # echo 0000:5e:00.0 > /sys/bus/pci/devices/0000:5e:00.0/driver/unbind # echo 13 > /sys/bus/pci/devices/0000:60:00.0/resource2_resize # echo 0000:5e:00.0 > /sys/bus/pci/drivers_probe # echo 0000:60:00.0 > /sys/bus/pci/drivers_probe > And I guess we probably need to rebind pcieport afterwards so hotplug, > etc will work again? Yep. Thanks, Alex > > pci 0000:60:00.0: BAR 2 [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing > > pcieport 0000:5f:01.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing > > pci 0000:5e:00.0: bridge window [mem 0xbfe0000000-0xbfefffffff 64bit pref]: releasing > > pci 0000:5e:00.0: BAR 0 [mem 0xbff0000000-0xbff07fffff 64bit pref]: releasing > > pcieport 0000:5d:00.0: bridge window [mem 0xbfe0000000-0xbff07fffff 64bit pref]: releasing > > pcieport 0000:5d:00.0: bridge window [mem 0xb000000000-0xb2ffffffff 64bit pref]: assigned > > pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned > > pci 0000:5e:00.0: BAR 0 [mem 0xb200000000-0xb2007fffff 64bit pref]: assigned > > pcieport 0000:5f:01.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned > > pci 0000:60:00.0: BAR 2 [mem 0xb000000000-0xb1ffffffff 64bit pref]: assigned > > pci 0000:5e:00.0: PCI bridge to [bus 5f-61] > > pci 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] > > pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref] > > pcieport 0000:5d:00.0: PCI bridge to [bus 5e-61] > > pcieport 0000:5d:00.0: bridge window [mem 0xb9000000-0xba0fffff] > > pcieport 0000:5d:00.0: bridge window [mem 0xb000000000-0xb2ffffffff 64bit pref] > > pci 0000:5e:00.0: PCI bridge to [bus 5f-61] > > pci 0000:5e:00.0: bridge window [mem 0xb9000000-0xba0fffff] > > pci 0000:5e:00.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref] > > pcieport 0000:5f:01.0: PCI bridge to [bus 60] > > pcieport 0000:5f:01.0: bridge window [mem 0xb9000000-0xb9ffffff] > > pcieport 0000:5f:01.0: bridge window [mem 0xb000000000-0xb1ffffffff 64bit pref] > > > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > > --- > > drivers/pci/setup-bus.c | 24 +++++++++++++++++++++++- > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > > index 909e6a7c3cc3..15fc8e4e84c9 100644 > > --- a/drivers/pci/setup-bus.c > > +++ b/drivers/pci/setup-bus.c > > @@ -2226,6 +2226,26 @@ void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > > } > > EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources); > > > > +static void pci_release_resource_type(struct pci_dev *pdev, unsigned long type) > > +{ > > + int i; > > + > > + if (!device_trylock(&pdev->dev)) > > + return; > > + > > + if (pdev->dev.driver) > > + goto unlock; > > + > > + for (i = 0; i < PCI_STD_NUM_BARS; i++) { > > + if (pci_resource_len(pdev, i) && > > + !((pci_resource_flags(pdev, i) ^ type) & PCI_RES_TYPE_MASK)) > > + pci_release_resource(pdev, i); > > + } > > + > > +unlock: > > + device_unlock(&pdev->dev); > > +} > > + > > int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) > > { > > struct pci_dev_resource *dev_res; > > @@ -2260,8 +2280,10 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) > > > > pci_info(bridge, "%s %pR: releasing\n", res_name, res); > > > > - if (res->parent) > > + if (res->parent) { > > release_resource(res); > > + pci_release_resource_type(bridge, type); > > + } > > res->start = 0; > > res->end = 0; > > break; > > -- > > 2.44.0 > > >