On Fri, 14 Dec 2018 11:18:38 -0600 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > [+cc Sinan, Alex, Rafael] > > On Fri, Dec 14, 2018 at 01:15:58PM +0000, Ross Lagerwall wrote: > > Hi Bjorn, > > > > Your commit 5b0764cac9f1 ("PCI: Probe for device reset support during > > enumeration") moved checking whether a device could be reset much earlier to > > when the device is first probed. When the device is first probed, the other > > devices on the bus may not have been discovered yet. This means that we will > > claim to support SBR as a reset mechanism because it is the only device > > behind the bus at that pointer meanwhile the others simply haven't been > > discovered yet. This results in dev->reset_fn being incorrectly set to true > > and a reset file being created. When userspace actually tries to use the > > reset file it fails because now there are other sibling devices preventing > > the use of an SBR. > > First of all, I'm very sorry about the regression and thanks very much > for the report! I know it's a lot of work to track down things like > this. > > We run pci_probe_reset_function() during pci_init_capabilities(), > which is before other devices on the bus are discovered, as you say. > That checks not just for SBR support, but for other types of reset > (FLR, PM, etc) as well. > > In your case the userspace reset actually fails, so I assume that > means SBR is the *only* supported reset type for that device? > > Since you have two devices on the bus, I guess SBR of device A > *should* work during the interval between enumerating A and B (it will > reset both A and B in hardware, but since the OS doesn't know about B > yet, that's probably OK). After we enumerate B and potentially bind a > driver to it, we can't use SBR any more, of course. > > I could imagine removing A's sysfs reset file when B is enumerated, > but the locking might be ugly. And of course A might support other > types of reset, and then you *want* A's sysfs file. I've actually wished for the opposite of this in the past, given a bus with devices A and B with no sysfs reset file, I wish I could 'echo 1 > B/remove' to rescan for SBR for device A. > What happened prior to 5b0764cac9f1? There was never a sysfs reset > file at all, so userspace didn't even attempt the reset? Yep. > We could wait to create the sysfs reset file until after we've > enumerated all the devices on the bus, but then we're opening a race > when a device could be hot-added to the bus. E.g., boot-time > enumeration finds only A, we create its sysfs reset file, then we > hot-add device B (admittedly mostly a conventional PCI scenario), and > now we're in the same situation you're seeing where A's reset file > exists, but it always fails because SBR is no longer safe. I'd welcome dynamically re-evaluating the reset availability on device add and remove, but it might get a bit complicated. Thanks, Alex