On 3/27/23 2:42 AM, Mika Westerberg wrote: > Hi, > > On Sun, Mar 26, 2023 at 08:22:07AM +0200, Lukas Wunner wrote: >> [cc += Ashok, Sathya, Ravi Kishore, Sheng Bi, Stanislav, Yang Su, Shuo Tan] >> >> On Wed, Mar 22, 2023 at 05:16:24PM -0500, Bjorn Helgaas wrote: >>> On Tue, Mar 21, 2023 at 11:50:31AM +0200, Mika Westerberg wrote: >>>> The PCIe spec prescribes that a device may take up to 1 second to >>>> recover from reset and this same delay is prescribed when coming out of >>>> D3cold (as that involves reset too). The device may extend this 1 second >>>> delay through Request Retry Status completions and we accommondate for >>>> that in Linux with 60 second cap, only in reset code path, not in resume >>>> code path. >>>> >>>> However, a device has surfaced, namely Intel Titan Ridge xHCI, which >>>> requires longer delay also in the resume code path. For this reason make >>>> the resume code path to use this same extended delay than with the reset >>>> path but only after the link has come up (active link reporting is >>>> supported) so that we do not wait longer time for devices that have >>>> become permanently innaccessible during system sleep, e.g because they >>>> have been removed. >>>> >>>> While there move the two constants from the pci.h header into pci.c as >>>> these are not used outside of that file anymore. >>>> >>>> Reported-by: Chris Chiu <chris.chiu@xxxxxxxxxxxxx> >>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728 >>>> Cc: Lukas Wunner <lukas@xxxxxxxxx> >>>> Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> >>> >>> Lukas just added the "timeout" parameter with ac91e6980563 ("PCI: >>> Unify delay handling for reset and resume"), so I'm going to look for >>> his ack for this. >> >> Acked-by: Lukas Wunner <lukas@xxxxxxxxx> >> >> >>> After ac91e6980563, we called pci_bridge_wait_for_secondary_bus() with >>> timeouts of either: >>> >>> 60s for reset (pci_bridge_secondary_bus_reset() or >>> dpc_reset_link()), or >>> >>> 1s for resume (pci_pm_resume_noirq() or pci_pm_runtime_resume() via >>> pci_pm_bridge_power_up_actions()) >>> >>> If I'm reading this right, the main changes of this patch are: >>> >>> - For slow links (<= 5 GT/s), we sleep 100ms, then previously waited >>> up to 1s (resume) or 60s (reset) for the device to be ready. Now >>> we will wait a max of 1s for both resume and reset. >>> >>> - For fast links (> 5 GT/s) we wait up to 100ms for the link to come >>> up and fail if it does not. If the link did come up in 100ms, we >>> previously waited up to 1s (resume) or 60s (reset). Now we will >>> wait up to 60s for both resume and reset. >>> >>> So this *reduces* the time we wait for slow links after reset, and >>> *increases* the time for fast links after resume. Right? >> >> Good point. So now the wait duration hinges on the link speed >> rather than reset versus resume. >> >> Before ac91e6980563 (which went into v6.3-rc1), the wait duration >> after a Secondary Bus Reset and a DPC-induced Hot Reset was >> essentially zero. And the Ponte Vecchio cards which necessitated >> ac91e6980563 are usually plugged into servers whose Root Ports >> support link speeds > 5 GT/s. So the risk of breaking anything >> with this change seems small. >> >> The reason why Mika is waiting only 1 second in the <= 5 GT/s case >> is that we don't check for the link to become active for these slower >> link speeds. That's because Link Active Reporting is only mandatory >> if the port is hotplug-capable or supports link speeds > 5 GT/s. >> Otherwise it's optional (PCIe r6.0.1 sec 7.5.3.6). >> >> It would be user-unfriendly to pause for 60 sec if the device does >> not come back after reset or resume (e.g. because it was removed) >> and the fact that the link is up is an indication that the device >> is present, but may just need a little more time to respond to >> Configuration Space Requests. >> >> We *could* afford the longer wait duration in the <= 5 GT/s case >> as well by checking if Link Active Reporting is supported and further >> checking if the link came up after the 100 ms delay prescribed by >> PCIe r6.0 sec 6.6.1. Should Link Active Reporting *not* be supported, >> we'd have to retain the shorter wait duration limit of 1 sec. >> >> This optimization opportunity for the <= 5 GT/s case does not have >> to be addressed in this patch. It could be added later on if it >> turns out that users do plug cards such as Ponte Vecchio into older >> Gen1/Gen2 Downstream Ports. (Unless Mika wants to perfect it right >> now.) >> > > No problem doing that :) I guess you mean something like the diff below, > so that we use active link reporting and the longer time also for any > downstream port that supports supports it, regardless of the speed. > > I can update the patch accordingly. > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 36d4aaa8cea2..b507a26ffb9d 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5027,7 +5027,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > if (!pcie_downstream_port(dev)) > return 0; > > - if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { > + if (!dev->link_active_reporting && > + pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { Do we still need speed check? It looks like we can take this path if link active reporting is not supported. > pci_dbg(dev, "waiting %d ms for downstream link\n", delay); > msleep(delay); > return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay); -- Sathyanarayanan Kuppuswamy Linux Kernel Developer