Re: [PATCH v4 3/3] PCI: Add CRS handling to pci_dev_wait()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 13, 2021 at 04:29:51PM +0000, Spassov, Stanislav wrote:
> On Sat, 2021-09-11 at 09:03 -0500, Bjorn Helgaas wrote:
> 
> > I think the Root Complex may eventually complete the transaction as
> > failed *regardless* of whether CRS SV is enabled.  This is unclear in
> > PCIe r5.0, sec 2.3.2, because the text formatting was broken between
> > r4.0 and r5.0.  [...]
> >
> >   A Root Complex implementation may choose to limit the number of
> >   Configuration Request/CRS Completion Status loops before determining
> >   that something is wrong with the target of the Request and taking
> >   appropriate action, e.g., complete the Request to the host as a
> >   failed transaction.
> 
> I can provide a bit more background:
> 
> The issue that prompted me to implement this patch involved a device that
> used CRS Completions to signal post-reset (non-)readiness. In some cases,
> the device would get stuck and continue issuing CRS Completions for all
> requests indefinitely.
> 
> The device was attached directly to a Root Port on a server-grade Intel CPU,
> and CRS SV was enabled on that Root Port. The original pci_dev_wait()
> implementation, by virtue of polling the Command register rather than the
> Vendor ID, would always cause a TOR timeout and associated host crash.
> 
> I later understood the specific CPU did have a proprietary register for
> "limiting the number of loops" that the PCIe spec talks about, and indeed
> that register was set to "no limit". Coupled with the stuck device, these
> indefinite retries eventually triggered TOR timeout.

"No limit" sounds like a pretty bad choice, given that it means the
CPU will essentially hang forever because of a defective I/O device.
There should be a timeout so software can recover (the *device* may
never recover, but that's no reason why the kernel must crash).

> Granted, there are surely Root Complexes that behave differently, since the
> PCIe spec leaves this up to the implementation. Still, this patch increases
> robustness by polling the safer Vendor ID register, which is safer at least
> in some situations, and not any less safe generally. However, it is not a
> simple matter of switching which register is polled due to the SR-IOV
> considerations that require a fallback to Command.

Yes.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux