Re: [PATCH] PCI/PM: Wait longer after reset when active link reporting is supported

Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> · Mon, 27 Mar 2023 12:42:50 +0300

Hi,

On Sun, Mar 26, 2023 at 08:22:07AM +0200, Lukas Wunner wrote:
> [cc += Ashok, Sathya, Ravi Kishore, Sheng Bi, Stanislav, Yang Su, Shuo Tan]
> 
> On Wed, Mar 22, 2023 at 05:16:24PM -0500, Bjorn Helgaas wrote:
> > On Tue, Mar 21, 2023 at 11:50:31AM +0200, Mika Westerberg wrote:
> > > The PCIe spec prescribes that a device may take up to 1 second to
> > > recover from reset and this same delay is prescribed when coming out of
> > > D3cold (as that involves reset too). The device may extend this 1 second
> > > delay through Request Retry Status completions and we accommondate for
> > > that in Linux with 60 second cap, only in reset code path, not in resume
> > > code path.
> > > 
> > > However, a device has surfaced, namely Intel Titan Ridge xHCI, which
> > > requires longer delay also in the resume code path. For this reason make
> > > the resume code path to use this same extended delay than with the reset
> > > path but only after the link has come up (active link reporting is
> > > supported) so that we do not wait longer time for devices that have
> > > become permanently innaccessible during system sleep, e.g because they
> > > have been removed.
> > > 
> > > While there move the two constants from the pci.h header into pci.c as
> > > these are not used outside of that file anymore.
> > > 
> > > Reported-by: Chris Chiu <chris.chiu@xxxxxxxxxxxxx>
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728
> > > Cc: Lukas Wunner <lukas@xxxxxxxxx>
> > > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> > 
> > Lukas just added the "timeout" parameter with ac91e6980563 ("PCI:
> > Unify delay handling for reset and resume"), so I'm going to look for
> > his ack for this.
> 
> Acked-by: Lukas Wunner <lukas@xxxxxxxxx>
> 
> 
> > After ac91e6980563, we called pci_bridge_wait_for_secondary_bus() with
> > timeouts of either:
> > 
> >   60s for reset (pci_bridge_secondary_bus_reset() or
> >       dpc_reset_link()), or
> > 
> >    1s for resume (pci_pm_resume_noirq() or pci_pm_runtime_resume() via
> >       pci_pm_bridge_power_up_actions())
> > 
> > If I'm reading this right, the main changes of this patch are:
> > 
> >   - For slow links (<= 5 GT/s), we sleep 100ms, then previously waited
> >     up to 1s (resume) or 60s (reset) for the device to be ready.  Now
> >     we will wait a max of 1s for both resume and reset.
> > 
> >   - For fast links (> 5 GT/s) we wait up to 100ms for the link to come
> >     up and fail if it does not.  If the link did come up in 100ms, we
> >     previously waited up to 1s (resume) or 60s (reset).  Now we will
> >     wait up to 60s for both resume and reset.
> > 
> > So this *reduces* the time we wait for slow links after reset, and
> > *increases* the time for fast links after resume.  Right?
> 
> Good point.  So now the wait duration hinges on the link speed
> rather than reset versus resume.
> 
> Before ac91e6980563 (which went into v6.3-rc1), the wait duration
> after a Secondary Bus Reset and a DPC-induced Hot Reset was
> essentially zero.  And the Ponte Vecchio cards which necessitated
> ac91e6980563 are usually plugged into servers whose Root Ports
> support link speeds > 5 GT/s.  So the risk of breaking anything
> with this change seems small.
> 
> The reason why Mika is waiting only 1 second in the <= 5 GT/s case
> is that we don't check for the link to become active for these slower
> link speeds.  That's because Link Active Reporting is only mandatory
> if the port is hotplug-capable or supports link speeds > 5 GT/s.
> Otherwise it's optional (PCIe r6.0.1 sec 7.5.3.6).
> 
> It would be user-unfriendly to pause for 60 sec if the device does
> not come back after reset or resume (e.g. because it was removed)
> and the fact that the link is up is an indication that the device
> is present, but may just need a little more time to respond to
> Configuration Space Requests.
> 
> We *could* afford the longer wait duration in the <= 5 GT/s case
> as well by checking if Link Active Reporting is supported and further
> checking if the link came up after the 100 ms delay prescribed by
> PCIe r6.0 sec 6.6.1.  Should Link Active Reporting *not* be supported,
> we'd have to retain the shorter wait duration limit of 1 sec.
> 
> This optimization opportunity for the <= 5 GT/s case does not have
> to be addressed in this patch.  It could be added later on if it
> turns out that users do plug cards such as Ponte Vecchio into older
> Gen1/Gen2 Downstream Ports.  (Unless Mika wants to perfect it right
> now.)
> 

No problem doing that :) I guess you mean something like the diff below,
so that we use active link reporting and the longer time also for any
downstream port that supports supports it, regardless of the speed.

I can update the patch accordingly.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 36d4aaa8cea2..b507a26ffb9d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5027,7 +5027,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 	if (!pcie_downstream_port(dev))
 		return 0;
 
-	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
+	if (!dev->link_active_reporting &&
+	    pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
 		msleep(delay);
 		return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay);