Re: [RFC PATCH 0/3] PCI: Use Configuration RRS to wait for device ready

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/28/2024 16:42, Bjorn Helgaas wrote:
On Wed, Aug 28, 2024 at 04:24:01PM -0500, Mario Limonciello wrote:
On 8/27/2024 18:48, Bjorn Helgaas wrote:
From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>

After a device reset, pci_dev_wait() waits for a device to become
completely ready by polling the PCI_COMMAND register.  The spec envisions
that software would instead poll for the device to stop responding to
config reads with Completions with Request Retry Status (RRS).

Polling PCI_COMMAND leads to hardware retries that are invisible to
software and the backoff between software retries doesn't work correctly.

Root Ports are not required to support the Configuration RRS Software
Visibility feature that prevents hardware retries and makes the RRS
Completions visible to software, so this series only uses it when available
and falls back to PCI_COMMAND polling when it's not.

This is completely untested and posted for comments.

Bjorn Helgaas (3):
    PCI: Wait for device readiness with Configuration RRS
    PCI: aardvark: Correct Configuration RRS checking
    PCI: Rename CRS Completion Status to RRS

   drivers/bcma/driver_pci_host.c             | 10 ++--
   drivers/pci/controller/dwc/pcie-tegra194.c | 18 +++---
   drivers/pci/controller/pci-aardvark.c      | 64 +++++++++++-----------
   drivers/pci/controller/pci-xgene.c         |  6 +-
   drivers/pci/controller/pcie-iproc.c        | 18 +++---
   drivers/pci/pci-bridge-emul.c              |  4 +-
   drivers/pci/pci.c                          | 41 +++++++++-----
   drivers/pci/pci.h                          | 11 +++-
   drivers/pci/probe.c                        | 33 +++++------
   include/linux/bcma/bcma_driver_pci.h       |  2 +-
   include/linux/pci.h                        |  1 +
   include/uapi/linux/pci_regs.h              |  6 +-
   12 files changed, 117 insertions(+), 97 deletions(-)

Although this looks like a useful series, I'm sorry to say but this doesn't
solve the issue that Gary and I raised.  We double checked today and found
that reading the vendor ID works just fine at this time.

Thanks for testing that.

Sure.


I think that we're still better off polling PCI_PM_CTRL to "wait" for D0
after the state change from D3cold.

Is there some spec justification for polling PCI_PM_CTRL?  I'm dubious
about doing that just because "it works" in this situation, unless we
have some better understanding about *why* it works and whether all
devices are supposed to work that way.


I mentioned this a little bit in my patch 3/5 in my submission. The issue isn't "normal" D3cold exit that is fully settled down. That takes ~6ms from measurements. The issue is how long it takes for D3cold *entry* followed by *exit*.

When this issue occurs it's tied with a tight loop of runtime PM entry and exit happening in that short window. That's why it can be tripped by unplugging a dock, waiting until ~approximately autosuspend delay and plugging it back in. If you catch the right timing then the USB4 router is still on it's way down to D3cold.

In terms of a way to match this problem to the spec, the closest I could think is PCI-PM spec.

But I do see in the PCI-PM spec that the delay for D0->D3hot should be 10ms. In the Linux kernel implementation __pci_set_power_state() when called with D3cold calls pci_set_low_power_state() which does wait 10ms followed by using the platform to remove power.

I can't find any timing requirements for D3hot->D3cold transition though.

I would hypothesize that if we injected a longer delay on the "other end" for the D3cold transition entry it would solve this issue as well though.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux