On Tue, 29 Nov 2022 07:48:12 +0100 Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Mon, Nov 28, 2022 at 03:06:17PM -0700, Alex Williamson wrote: > > Agreed. Is this convoluted removal process being used to force a SBR, > > versus a FLR or PM reset that might otherwise be used by twiddling the > > reset attribute of the GPU directly? If so, the reset_method attribute > > can be used to force a bus reset and perform all the state save/restore > > handling to avoid reallocating BARs. A reset from the upstream switch > > port would only be necessary if you have some reason to also reset the > > switch downstream ports. Thanks, > > A Secondary Bus Reset is only offered as a reset_method if the > device to be reset is the *only* child of the upstream bridge. > I.e. if the device to be reset has siblings or children, > a Secondary Bus Reset is not permitted. > > Modern GPUs (including the one Mika is referring to) consist of > a PCIe switch with the GPU, HD audio and telemetry devices below > Downstream Bridges. A Secondary Bus Reset of the Root Port is > not allowed in this case because the Switch Upstream Port has > children. I didn't see such functions in the log provided, the GPU in question seems to be a single function device at 53:00.0. This matches what I've seen on an ARC A380 GPU where the GPU and HD audio are each single function devices under separate downstream ports of a PCIe switch. > See this code in pci_parent_bus_reset(): > > if (pci_is_root_bus(dev->bus) || dev->subordinate || > !dev->bus->self || dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) > return -ENOTTY; > > The dev->subordinate check disallows a SBR if there are children. > Note that the code should probably instead check for... > (dev->subordinate && !list_empty(dev->subordinate->devices)) > ...because the port may have a subordinate bus without children > (may have been removed for example). > > The "no siblings" rule is enforced by: > > list_for_each_entry(pdev, &dev->bus->devices, bus_list) > if (pdev != dev) > return -ENOTTY; > > Note that the devices list is iterated without holding pci_bus_sem, > which looks fishy. > > That said, it *is* possible that a Secondary Bus Reset is erroneously > offered despite these checks because we perform them early on device > enumeration when the subordinate bus hasn't been scanned yet. > > So if the Root Port offers other reset methods besides SBR and the > user switches to one of them, then reinstates the defaults, > suddenly SBR will disappear because the subordinate bus has since > been scanned. What's missing here is that we re-check availability > of the reset methods on siblings and the parent when a device is > added or removed. This is also necessary to make reset_method > work properly with hotplug. However, the result may be that the > reset_method attribute in sysfs may become invisible after adding > a device (because there is no reset method available) and reappear > after removing a device. > > So the reset_method logic is pretty broken right now I'm afraid. I haven't checked for a while, but I thought we exposed SBR regardless of siblings, though it can't be accessed via the reset attribute if there are siblings. That allows that the sibling devices could be soft removed, a reset performed, and the bus re-scanned. If there are in fact sibling devices, it would make more sense to remove only those to effect a bus reset to avoid the resource issues with rescanning SR-IOV on the GPU. > In any case, for Mika's use case it would be useful to have a > "reset_subordinate" attribute on ports capable of a SBR such that > the entire hierarchy below is reset. The "reset" attribute is > insufficient. I'll toss out that a pretty simple vfio tool can be written to bind all the siblings on a bus enabling the hot reset ioctl in vfio. Thanks, Alex