On Mon, 28 Nov 2022 14:39:32 -0600 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > [+cc Alex] > > Hi Mika, > > On Mon, Nov 28, 2022 at 01:14:14PM +0200, Mika Westerberg wrote: > > Hi Bjorn, > > > > There is another PCI resource allocation issue with some Intel GPUs but > > probably applies to other similar devices as well. This is something > > encountered in data centers where they trigger reset (secondary bus > > reset) to the GPUs if there is hang or similar detected. Basically they > > do something like: > > > > 1. Unbind the graphics driver(s) through sysfs. > > 2. Remove the PCIe devices under the root port or the PCIe switch > > upstream port through sysfs (echo 1 > ../remove). > > 3. Trigger reset through config space or use the sysfs reset attribute. > > 4. Run rescan on the root bus (echo 1 > /sys/bus/pci/rescan) > > > > Expectation is to see the devices come back in the same way prior the > > reset but what actually happens is that the Linux PCI resource > > allocation fails to allocate space for some of the resources. In this > > case it is the IOV BARs. > > > > BIOS allocates resources for all these at boot time but after the rescan > > Linux tries to re-allocate them but since the allocation algorithm is > > more "consuming" some of the BARs do not fit to the available resource > > space. > > Thanks for the report! Definitely sounds like an issue. I doubt that > I'll have time to work on it myself in the near future. > > Is the "remove" before the reset actually necessary? If we could > avoid the removal, maybe the config space save/restore we already do > around reset would avoid the issue? Agreed. Is this convoluted removal process being used to force a SBR, versus a FLR or PM reset that might otherwise be used by twiddling the reset attribute of the GPU directly? If so, the reset_method attribute can be used to force a bus reset and perform all the state save/restore handling to avoid reallocating BARs. A reset from the upstream switch port would only be necessary if you have some reason to also reset the switch downstream ports. Thanks, Alex