Re: Regression caused by "PCI: Probe for device reset support during enumeration"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 14 Dec 2018 11:18:38 -0600
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> [+cc Sinan, Alex, Rafael]
> 
> On Fri, Dec 14, 2018 at 01:15:58PM +0000, Ross Lagerwall wrote:
> > Hi Bjorn,
> > 
> > Your commit 5b0764cac9f1 ("PCI: Probe for device reset support during
> > enumeration") moved checking whether a device could be reset much earlier to
> > when the device is first probed. When the device is first probed, the other
> > devices on the bus may not have been discovered yet. This means that we will
> > claim to support SBR as a reset mechanism because it is the only device
> > behind the bus at that pointer meanwhile the others simply haven't been
> > discovered yet. This results in dev->reset_fn being incorrectly set to true
> > and a reset file being created. When userspace actually tries to use the
> > reset file it fails because now there are other sibling devices preventing
> > the use of an SBR.  
> 
> First of all, I'm very sorry about the regression and thanks very much
> for the report!  I know it's a lot of work to track down things like
> this.
> 
> We run pci_probe_reset_function() during pci_init_capabilities(),
> which is before other devices on the bus are discovered, as you say.
> That checks not just for SBR support, but for other types of reset
> (FLR, PM, etc) as well.
> 
> In your case the userspace reset actually fails, so I assume that
> means SBR is the *only* supported reset type for that device?
> 
> Since you have two devices on the bus, I guess SBR of device A
> *should* work during the interval between enumerating A and B (it will
> reset both A and B in hardware, but since the OS doesn't know about B
> yet, that's probably OK).  After we enumerate B and potentially bind a
> driver to it, we can't use SBR any more, of course.
> 
> I could imagine removing A's sysfs reset file when B is enumerated,
> but the locking might be ugly.  And of course A might support other
> types of reset, and then you *want* A's sysfs file.

I've actually wished for the opposite of this in the past, given a bus
with devices A and B with no sysfs reset file, I wish I could 'echo 1 >
B/remove' to rescan for SBR for device A.

> What happened prior to 5b0764cac9f1?  There was never a sysfs reset
> file at all, so userspace didn't even attempt the reset?

Yep.

> We could wait to create the sysfs reset file until after we've
> enumerated all the devices on the bus, but then we're opening a race
> when a device could be hot-added to the bus.  E.g., boot-time
> enumeration finds only A, we create its sysfs reset file, then we
> hot-add device B (admittedly mostly a conventional PCI scenario), and
> now we're in the same situation you're seeing where A's reset file
> exists, but it always fails because SBR is no longer safe.

I'd welcome dynamically re-evaluating the reset availability on device
add and remove, but it might get a bit complicated.  Thanks,

Alex



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux