Re: [PATCH] PCI/PM: Wait longer after reset when active link reporting is supported

Lukas Wunner <lukas@xxxxxxxxx> · Sun, 26 Mar 2023 08:22:07 +0200

[cc += Ashok, Sathya, Ravi Kishore, Sheng Bi, Stanislav, Yang Su, Shuo Tan]

On Wed, Mar 22, 2023 at 05:16:24PM -0500, Bjorn Helgaas wrote:
> On Tue, Mar 21, 2023 at 11:50:31AM +0200, Mika Westerberg wrote:
> > The PCIe spec prescribes that a device may take up to 1 second to
> > recover from reset and this same delay is prescribed when coming out of
> > D3cold (as that involves reset too). The device may extend this 1 second
> > delay through Request Retry Status completions and we accommondate for
> > that in Linux with 60 second cap, only in reset code path, not in resume
> > code path.
> > 
> > However, a device has surfaced, namely Intel Titan Ridge xHCI, which
> > requires longer delay also in the resume code path. For this reason make
> > the resume code path to use this same extended delay than with the reset
> > path but only after the link has come up (active link reporting is
> > supported) so that we do not wait longer time for devices that have
> > become permanently innaccessible during system sleep, e.g because they
> > have been removed.
> > 
> > While there move the two constants from the pci.h header into pci.c as
> > these are not used outside of that file anymore.
> > 
> > Reported-by: Chris Chiu <chris.chiu@xxxxxxxxxxxxx>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728
> > Cc: Lukas Wunner <lukas@xxxxxxxxx>
> > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> 
> Lukas just added the "timeout" parameter with ac91e6980563 ("PCI:
> Unify delay handling for reset and resume"), so I'm going to look for
> his ack for this.

Acked-by: Lukas Wunner <lukas@xxxxxxxxx>

> After ac91e6980563, we called pci_bridge_wait_for_secondary_bus() with
> timeouts of either:
> 
>   60s for reset (pci_bridge_secondary_bus_reset() or
>       dpc_reset_link()), or
> 
>    1s for resume (pci_pm_resume_noirq() or pci_pm_runtime_resume() via
>       pci_pm_bridge_power_up_actions())
> 
> If I'm reading this right, the main changes of this patch are:
> 
>   - For slow links (<= 5 GT/s), we sleep 100ms, then previously waited
>     up to 1s (resume) or 60s (reset) for the device to be ready.  Now
>     we will wait a max of 1s for both resume and reset.
> 
>   - For fast links (> 5 GT/s) we wait up to 100ms for the link to come
>     up and fail if it does not.  If the link did come up in 100ms, we
>     previously waited up to 1s (resume) or 60s (reset).  Now we will
>     wait up to 60s for both resume and reset.
> 
> So this *reduces* the time we wait for slow links after reset, and
> *increases* the time for fast links after resume.  Right?

Good point.  So now the wait duration hinges on the link speed
rather than reset versus resume.

Before ac91e6980563 (which went into v6.3-rc1), the wait duration
after a Secondary Bus Reset and a DPC-induced Hot Reset was
essentially zero.  And the Ponte Vecchio cards which necessitated
ac91e6980563 are usually plugged into servers whose Root Ports
support link speeds > 5 GT/s.  So the risk of breaking anything
with this change seems small.

The reason why Mika is waiting only 1 second in the <= 5 GT/s case
is that we don't check for the link to become active for these slower
link speeds.  That's because Link Active Reporting is only mandatory
if the port is hotplug-capable or supports link speeds > 5 GT/s.
Otherwise it's optional (PCIe r6.0.1 sec 7.5.3.6).

It would be user-unfriendly to pause for 60 sec if the device does
not come back after reset or resume (e.g. because it was removed)
and the fact that the link is up is an indication that the device
is present, but may just need a little more time to respond to
Configuration Space Requests.

We *could* afford the longer wait duration in the <= 5 GT/s case
as well by checking if Link Active Reporting is supported and further
checking if the link came up after the 100 ms delay prescribed by
PCIe r6.0 sec 6.6.1.  Should Link Active Reporting *not* be supported,
we'd have to retain the shorter wait duration limit of 1 sec.

This optimization opportunity for the <= 5 GT/s case does not have
to be addressed in this patch.  It could be added later on if it
turns out that users do plug cards such as Ponte Vecchio into older
Gen1/Gen2 Downstream Ports.  (Unless Mika wants to perfect it right
now.)

Thanks,

Lukas