Re: [PATCH v2 2/2] PCI/PM: Decrease wait time for devices behind slow links

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Thu, 6 Apr 2023 12:55:07 -0500

On Wed, Apr 05, 2023 at 12:39:29PM +0300, Mika Westerberg wrote:
> On Tue, Apr 04, 2023 at 04:36:55PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 04, 2023 at 08:27:14AM +0300, Mika Westerberg wrote:
> > > In order speed up reset and resume time of devices behind slow links,
> > > decrease the wait time to 1s. This should give enough time for them to
> > > respond.
> > 
> > Is there some spec language behind this?  In sec 6.6.1, I see that all
> > devices "must be able to receive a Configuration Request and return a
> > Successful Completion".
> > 
> > A preceding rule says devices with slow links must enter LTSSM Detect
> > within 20ms, but I don't see a direct connection from that to a
> > shorter wait time.
> 
> I think this (PCIe 5.0 p. 553):
> 
> "Following a Conventional Reset of a device, within 1.0 s the device
>  must be able to receive a Configuration Request and return a Successful
>  Completion if the Request is valid."

Right, I think this applies to all devices, regardless of link speed.

> > > While doing this, instead of looking at the speed we check if
> > > the port supports active link reporting.
> > 
> > Why check dev->link_active_reporting (i.e., PCI_EXP_LNKCAP_DLLLARC)
> > instead of the link speed described by the spec?
> 
> This is what Sathyanarayanan suggested in the previous version comments.
> 
> > DLLLARC is required for fast links, but it's not prohibited for slower
> > links and it's *required* for hotplug ports with slow links, so
> > dev->link_active_reporting is not completely determined by link speed.
> > 
> > IIUC, the current code basically has these cases:
> > 
> >   1) All devices on secondary bus have zero D3cold delay:
> >        return immediately; no delay at all
> > 
> >   2) Non-PCIe bridge:
> >        sleep 1000ms
> >        sleep  100ms (typical, depends on downstream devices)
> > 
> >   3) Speed <= 5 GT/s:
> >        sleep 100ms (typical)
> >        sleep up to 59.9s (typical) waiting for valid config read
> > 
> >   4) Speed > 5 GT/s (DLLLARC required):
> >        sleep 20ms
> >        sleep up to 1000ms waiting for DLLLA
> >        sleep 100ms (typical)
> >        sleep up to 59.9s (typical) waiting for valid config read
> > 
> > This patch changes cases 3) and 4) to:
> > 
> >   3) DLLLARC not supported:
> >        sleep 100ms (typical)
> >        sleep up to 1.0s (typical) waiting for valid config read
> > 
> >   4) DLLLARC supported:
> >        no change in wait times, ~60s total
> > 
> > And testing dev->link_active_reporting instead of speed means slow
> > hotplug ports (and possibly other slow ports that implement DLLLARC)
> > that previously were in case 3) will now be in case 4).
> 
> Yes, and we do that because if the device gets unplugged while we were
> in susppend we don't want to wait for the total 60s for it to become
> ready. That's what the DLLLARC can tell us (for ports that support it).
> For the ports that do not we want to give the device some time but not
> to wait for that 60s so we wait for the 1s as the "minimum" requirement
> from the spec before it can be determined "broken".

Ah, thanks, I think I see what you're doing here.

I think what makes this confusing is that there are several reasons
for waiting, and they're mixed together and done at various places:

  1) We want to avoid PCIe protocol errors, e.g., Completion Timeout,
  and the spec specifies how long to wait before sending a config
  request to a device below the port.

  2) We want to wait for slow devices to finish internal
  initialization even after it can respond to config requests, e.g.,
  with Request Retry Status completions.  The spec doesn't say how
  long to wait here, so we arbitrarily wait up to 60s.

  3) We want to detect missing devices (possibly removed while
  suspended) quickly, without waiting 60s.

I think this might be easier to follow if we restructured a bit, maybe
something like this:

  bool wait_for_link_active(struct pci_dev *bridge)
  {
    /* I don't see a max time to Link Active in the spec (?) */

    for (timeout = 1000; timeout > 0; timeout -= 10) {
      pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &status);
      if (status & PCI_EXP_LNKSTA_DLLLA)
        return true;
      msleep(10);
    }
    return false;
  }

  pci_bridge_wait_for_secondary_bus(...)
  {
    ...
    if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
      msleep(100);
    } else {
      link_active = wait_for_link_active(dev);
      if (link_active)
        msleep(100);
    }

    /* Everything above is delays mandated by PCIe r6.0 sec 6.6.1 */

    if (dev->link_active_reporting) {
      pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
      if (!(status & PCI_EXP_LNKSTA_DLLLA))
        /* all downstream devices are disconnected; maybe mark them? */
        return;
    }

    /* Wait for non-RRS completion */
    pci_dev_wait(child, PCIE_RESET_READY_POLL_MS);
  }