Re: PCI CRS Support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sinan,

On Wed, Aug 24, 2016 at 11:56:18AM -0400, Sinan Kaya wrote:
> Hi Bjorn,
> I see that the kernel has support for Configuration Request Retry Status (CRS) visibility
> support and it gets discovered and enabled as part of the probe function.
> 
> Let's assume a system with CRS capability and have its visibility set as above.
> I do not see any code in the failure/reset path to support the CRS requests
> returned by the endpoint.
> 
> An endpoint is allowed to return CRS after several reset types. I'm pasting the part of
> the spec for you at 2.3.1 Request Handling Rules of 3.1 spec.
> 
> "For Configuration Requests only, following reset it is possible for a device to terminate the request 
> but indicate that it is temporarily unable to process the Request, but will be able to process the Request 
> in the future – in this case, the Configuration Request Retry 10 Status (CRS) Completion Status is used 
> (see Section 6.6). Valid reset conditions after which a device is permitted to return CRS are:
> 
> - Cold, Warm, and Hot Resets
> - FLRs
> - A reset initiated in response to a D3hot to D0uninitialized device state transition."
> 
> I have identified the following functions that have problems for warm and hot resets.
> 
> Some callers of pci_reset_bridge_secondary_bus such as pciehp_reset_slot, aer_root_reset.
> Other higher level callers such as pci_bus_reset, pci_try_reset_bus and their callers from VFIO.
> All these places are impacted by a CRS call. They do the secondary bus reset but do not wait for the
> endpoint to respond. Waiting for 1 second is not a guarantee that the endpoint will start responding
> immediately. A CRS capable OS needs to interpret the incoming CRS response and poll longer
> since CRS visibility is et.
> 
> All of this was warm and hot reset.
> 
> I also see another problem in the FLR path too. There is some best effort wait up to 1 second in
> pci_flr_wait.
> 
> Where do we go from here? I was thinking of putting something deep down into the reset secondary
> bus function but I'm afraid it will break things especially when we wait up to 60 seconds.

I agree CRS handling after reset is probably all broken.

I hate the fact that we reset devices without re-enumerating them.  We
have no assurance that the device is the same after reset (it could
have loaded new firmware and been completely reconfigured).

I don't have any good suggestions for you, so if you have some ideas
and want to fix it, please go ahead.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux