Re: PCIe resets/restore and lack of CRS wait

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Thu, 22 Mar 2018 09:14:02 -0500

On Thu, Mar 22, 2018 at 08:58:06AM -0500, Sinan Kaya wrote:
> On 3/22/2018 8:46 AM, Benjamin Herrenschmidt wrote:
> > On Thu, 2018-03-22 at 07:25 -0400, okaya@xxxxxxxxxxxxxx wrote:
> >>> That tells me that there is no guarantee by spec that we'll get
> >>> ffff's, instead we might get HW stalls, or other really nasty
> >>> effects when probing a register other than 0 (VID/DID) for CRS.
> >>
> >> AFAIK, spec also mentions that sw needs to observe 0xffffffff for all 
> >> other registers other than vendor id during CRS period.
> > 
> > This isnt what's in the 3.1a spec at least ... section 2.3.2 explains
> > the specified behaviour which is, for any register other than 0
> > (VID/DID), to re-issue the request...
> 
> I don't have any hard preference on this. Bjorn wanted code to work for
> systems with and without CRS capability. That was the reason we stayed
> away from 0xffff0001.

CRS SV is optional, so the code has to work when it's absent.  But we
can tell whether it's supported, so if we need to, we can do something
different when it's absent.

> CRS just gives you HW implementation defined retries for non vendor-id
> register like you mentioned. If device does not reply in this period
> of polling time, you should get all 1s eventually back.
> 
> All 1s is the spec way of saying device doesn't exist for config
> transactions.

I remember implementation notes mentioning all 1's data returns, e.g.,
PCIe r4.0, sec 2.3.2, but I don't *think* there's actually a spec
requirement that CRS or other errors be reported that way.  So if
there's a way to avoid relying on all 1's data, I think that would be
a good thing.

Bjorn