Re: [PATCH v4 3/3] PCI: Add CRS handling to pci_dev_wait()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 13, 2021 at 11:04 AM Spassov, Stanislav <stanspas@xxxxxxxxx> wrote:
>
> On Mon, 2021-09-13 at 11:38 -0500, Bjorn Helgaas wrote:
> > On Mon, Sep 13, 2021 at 04:29:51PM +0000, Spassov, Stanislav wrote:
> > > On Sat, 2021-09-11 at 09:03 -0500, Bjorn Helgaas wrote:
> > >
> > > I later understood the specific CPU did have a proprietary register for
> > > "limiting the number of loops" that the PCIe spec talks about, and indeed
> > > that register was set to "no limit". Coupled with the stuck device, these
> > > indefinite retries eventually triggered TOR timeout.
> >
> > "No limit" sounds like a pretty bad choice, given that it means the
> > CPU will essentially hang forever because of a defective I/O device.
> > There should be a timeout so software can recover (the *device* may
> > never recover, but that's no reason why the kernel must crash).
> >
>
> Correct. "No limit" is definitely a bad choice for that register,
> and fixing the value would be preferable to any software solution.
>
> Unfortunately, at least in the case I worked on, that register was
> not accessible by the kernel.

I can acknowledge that I have across exactly the same issue (no limit
on retries, results in CPU hang) on another old Intel root port too in
the past:
https://lore.kernel.org/linux-pci/53FFA54D.9000907@xxxxxxxxx/
https://lkml.org/lkml/2014/8/1/186

and had the same problem (no way to limit the number of retries). I'd
be interested and will keep a lookout for the next patch Stanislav
sends out!

Thanks!

Rajat

> Intel exposes many CPU configuration
> registers in terms of virtual PCI devices residing directly on Root
> Buses, and the system/platform firmware is able to use vendor-provided
> means to completely hide some of these pseudo-devices from the OS.
>
> Additionally, the way the PCIe spec is phrased, not every Root Complex
> implementation is required to even have such a limiting register, while
> all implementations that advertise CRS SV capability are required to
> behave as prescribed when PCI_VENDOR_ID is read. Hence why I believe
> this patch is a general robustness improvement, rather than a workaround
> for a specific device/platform.
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux