Re: [PATCH] PCI/PM: Wait longer after reset when active link reporting is supported

Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> · Mon, 27 Mar 2023 08:08:21 -0700

On 3/27/23 2:42 AM, Mika Westerberg wrote:
> Hi,
> 
> On Sun, Mar 26, 2023 at 08:22:07AM +0200, Lukas Wunner wrote:
>> [cc += Ashok, Sathya, Ravi Kishore, Sheng Bi, Stanislav, Yang Su, Shuo Tan]
>>
>> On Wed, Mar 22, 2023 at 05:16:24PM -0500, Bjorn Helgaas wrote:
>>> On Tue, Mar 21, 2023 at 11:50:31AM +0200, Mika Westerberg wrote:
>>>> The PCIe spec prescribes that a device may take up to 1 second to
>>>> recover from reset and this same delay is prescribed when coming out of
>>>> D3cold (as that involves reset too). The device may extend this 1 second
>>>> delay through Request Retry Status completions and we accommondate for
>>>> that in Linux with 60 second cap, only in reset code path, not in resume
>>>> code path.
>>>>
>>>> However, a device has surfaced, namely Intel Titan Ridge xHCI, which
>>>> requires longer delay also in the resume code path. For this reason make
>>>> the resume code path to use this same extended delay than with the reset
>>>> path but only after the link has come up (active link reporting is
>>>> supported) so that we do not wait longer time for devices that have
>>>> become permanently innaccessible during system sleep, e.g because they
>>>> have been removed.
>>>>
>>>> While there move the two constants from the pci.h header into pci.c as
>>>> these are not used outside of that file anymore.
>>>>
>>>> Reported-by: Chris Chiu <chris.chiu@xxxxxxxxxxxxx>
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728
>>>> Cc: Lukas Wunner <lukas@xxxxxxxxx>
>>>> Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
>>>
>>> Lukas just added the "timeout" parameter with ac91e6980563 ("PCI:
>>> Unify delay handling for reset and resume"), so I'm going to look for
>>> his ack for this.
>>
>> Acked-by: Lukas Wunner <lukas@xxxxxxxxx>
>>
>>
>>> After ac91e6980563, we called pci_bridge_wait_for_secondary_bus() with
>>> timeouts of either:
>>>
>>>   60s for reset (pci_bridge_secondary_bus_reset() or
>>>       dpc_reset_link()), or
>>>
>>>    1s for resume (pci_pm_resume_noirq() or pci_pm_runtime_resume() via
>>>       pci_pm_bridge_power_up_actions())
>>>
>>> If I'm reading this right, the main changes of this patch are:
>>>
>>>   - For slow links (<= 5 GT/s), we sleep 100ms, then previously waited
>>>     up to 1s (resume) or 60s (reset) for the device to be ready.  Now
>>>     we will wait a max of 1s for both resume and reset.
>>>
>>>   - For fast links (> 5 GT/s) we wait up to 100ms for the link to come
>>>     up and fail if it does not.  If the link did come up in 100ms, we
>>>     previously waited up to 1s (resume) or 60s (reset).  Now we will
>>>     wait up to 60s for both resume and reset.
>>>
>>> So this *reduces* the time we wait for slow links after reset, and
>>> *increases* the time for fast links after resume.  Right?
>>
>> Good point.  So now the wait duration hinges on the link speed
>> rather than reset versus resume.
>>
>> Before ac91e6980563 (which went into v6.3-rc1), the wait duration
>> after a Secondary Bus Reset and a DPC-induced Hot Reset was
>> essentially zero.  And the Ponte Vecchio cards which necessitated
>> ac91e6980563 are usually plugged into servers whose Root Ports
>> support link speeds > 5 GT/s.  So the risk of breaking anything
>> with this change seems small.
>>
>> The reason why Mika is waiting only 1 second in the <= 5 GT/s case
>> is that we don't check for the link to become active for these slower
>> link speeds.  That's because Link Active Reporting is only mandatory
>> if the port is hotplug-capable or supports link speeds > 5 GT/s.
>> Otherwise it's optional (PCIe r6.0.1 sec 7.5.3.6).
>>
>> It would be user-unfriendly to pause for 60 sec if the device does
>> not come back after reset or resume (e.g. because it was removed)
>> and the fact that the link is up is an indication that the device
>> is present, but may just need a little more time to respond to
>> Configuration Space Requests.
>>
>> We *could* afford the longer wait duration in the <= 5 GT/s case
>> as well by checking if Link Active Reporting is supported and further
>> checking if the link came up after the 100 ms delay prescribed by
>> PCIe r6.0 sec 6.6.1.  Should Link Active Reporting *not* be supported,
>> we'd have to retain the shorter wait duration limit of 1 sec.
>>
>> This optimization opportunity for the <= 5 GT/s case does not have
>> to be addressed in this patch.  It could be added later on if it
>> turns out that users do plug cards such as Ponte Vecchio into older
>> Gen1/Gen2 Downstream Ports.  (Unless Mika wants to perfect it right
>> now.)
>>
> 
> No problem doing that :) I guess you mean something like the diff below,
> so that we use active link reporting and the longer time also for any
> downstream port that supports supports it, regardless of the speed.
> 
> I can update the patch accordingly.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 36d4aaa8cea2..b507a26ffb9d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5027,7 +5027,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
>  	if (!pcie_downstream_port(dev))
>  		return 0;
>  
> -	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
> +	if (!dev->link_active_reporting &&
> +	    pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {

Do we still need speed check? It looks like we can take this path
if link active reporting is not supported.

>  		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
>  		msleep(delay);
>  		return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay);

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer