Hi, On Tue, Nov 29, 2022 at 06:52:42AM -0700, Alex Williamson wrote: > On Tue, 29 Nov 2022 07:48:12 +0100 > Lukas Wunner <lukas@xxxxxxxxx> wrote: > > > On Mon, Nov 28, 2022 at 03:06:17PM -0700, Alex Williamson wrote: > > > Agreed. Is this convoluted removal process being used to force a SBR, > > > versus a FLR or PM reset that might otherwise be used by twiddling the > > > reset attribute of the GPU directly? If so, the reset_method attribute > > > can be used to force a bus reset and perform all the state save/restore > > > handling to avoid reallocating BARs. A reset from the upstream switch > > > port would only be necessary if you have some reason to also reset the > > > switch downstream ports. Thanks, > > > > A Secondary Bus Reset is only offered as a reset_method if the > > device to be reset is the *only* child of the upstream bridge. > > I.e. if the device to be reset has siblings or children, > > a Secondary Bus Reset is not permitted. > > > > Modern GPUs (including the one Mika is referring to) consist of > > a PCIe switch with the GPU, HD audio and telemetry devices below > > Downstream Bridges. A Secondary Bus Reset of the Root Port is > > not allowed in this case because the Switch Upstream Port has > > children. > > I didn't see such functions in the log provided, the GPU in question > seems to be a single function device at 53:00.0. This matches what > I've seen on an ARC A380 GPU where the GPU and HD audio are each single > function devices under separate downstream ports of a PCIe switch. Yes, this one is similar. There is a PCIe switch and then bunch of devices connected to the downstream ports. One of them being the GPU. Sorry if I missed that part in the report. There are typically multiple of these cards and they want to perform the reset seperately for each. > > See this code in pci_parent_bus_reset(): > > > > if (pci_is_root_bus(dev->bus) || dev->subordinate || > > !dev->bus->self || dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) > > return -ENOTTY; > > > > The dev->subordinate check disallows a SBR if there are children. > > Note that the code should probably instead check for... > > (dev->subordinate && !list_empty(dev->subordinate->devices)) > > ...because the port may have a subordinate bus without children > > (may have been removed for example). > > > > The "no siblings" rule is enforced by: > > > > list_for_each_entry(pdev, &dev->bus->devices, bus_list) > > if (pdev != dev) > > return -ENOTTY; > > > > Note that the devices list is iterated without holding pci_bus_sem, > > which looks fishy. > > > > That said, it *is* possible that a Secondary Bus Reset is erroneously > > offered despite these checks because we perform them early on device > > enumeration when the subordinate bus hasn't been scanned yet. > > > > So if the Root Port offers other reset methods besides SBR and the > > user switches to one of them, then reinstates the defaults, > > suddenly SBR will disappear because the subordinate bus has since > > been scanned. What's missing here is that we re-check availability > > of the reset methods on siblings and the parent when a device is > > added or removed. This is also necessary to make reset_method > > work properly with hotplug. However, the result may be that the > > reset_method attribute in sysfs may become invisible after adding > > a device (because there is no reset method available) and reappear > > after removing a device. > > > > So the reset_method logic is pretty broken right now I'm afraid. > > I haven't checked for a while, but I thought we exposed SBR regardless > of siblings, though it can't be accessed via the reset attribute if > there are siblings. That allows that the sibling devices could be soft > removed, a reset performed, and the bus re-scanned. If there are in > fact sibling devices, it would make more sense to remove only those to > effect a bus reset to avoid the resource issues with rescanning SR-IOV > on the GPU. If I understand correctly they perform the reset just above the upstream port of the PCIe switch so that it resets the whole "card". > > > In any case, for Mika's use case it would be useful to have a > > "reset_subordinate" attribute on ports capable of a SBR such that > > the entire hierarchy below is reset. The "reset" attribute is > > insufficient. > > I'll toss out that a pretty simple vfio tool can be written to bind all > the siblings on a bus enabling the hot reset ioctl in vfio. Thanks, Sounds good but unfortunately I'm not fluent in vfio so I have no idea how this simple tool could be done :( Do you have any examples or pointers that we could use to try this out? Thanks!