Re: [PATCH v3] PCI/PM: Bail out early in pci_bridge_wait_for_secondary_bus() if link is not trained

Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> · Fri, 14 Apr 2023 13:11:47 +0300

Hi,

On Fri, Apr 14, 2023 at 09:42:38AM +0200, Lukas Wunner wrote:
> On Thu, Apr 13, 2023 at 01:16:42PM +0300, Mika Westerberg wrote:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5037,6 +5037,22 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
> >  		}
> >  	}
> >  
> > +	/*
> > +	 * Everything above is handling the delays mandated by the PCIe r6.0
> > +	 * sec 6.6.1.
> > +	 *
> > +	 * If the port supports active link reporting we now check one more
> > +	 * time if the link is active and if not bail out early with the
> > +	 * assumption that the device is not present anymore.
> > +	 */
> > +	if (dev->link_active_reporting) {
> > +		u16 status;
> > +
> > +		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
> > +		if (!(status & PCI_EXP_LNKSTA_DLLLA))
> > +			return -ENOTTY;
> > +	}
> > +
> >  	return pci_dev_wait(child, reset_type,
> >  			    PCIE_RESET_READY_POLL_MS - delay);
> >  }
> 
> Hm, shouldn't the added code live in the
> 
> 	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT)
> 
> branch?  For the else branch (Gen3+ devices with > 5 GT/s),
> we've already waited for the link to become active, so the
> additional check seems superfluous.  (But maybe I'm missing
> something.)

You are not missing anything ;-) Indeed it should belong there, and I
think we are also missing now the "optimization" for devices behind slow
link without active link reporting capabilities. That is we wait for the
1s instead of the whole 60s.

> I also note that this documentation change has been dropped
> vis-à-vis v1 of the patch, not sure if that's intentional:
> 
> -	* However, 100 ms is the minimum and the PCIe spec says the
> -	* software must allow at least 1s before it can determine that the
> -	* device that did not respond is a broken device. There is
> -	* evidence that 100 ms is not always enough, for example certain
> -	* Titan Ridge xHCI controller does not always respond to
> -	* configuration requests if we only wait for 100 ms (see
> -	* https://bugzilla.kernel.org/show_bug.cgi?id=203885).
> +	* However, 100 ms is the minimum and the PCIe spec says the software
> +	* must allow at least 1s before it can determine that the device that
> +	* did not respond is a broken device. Also device can take longer than
> +	* that to respond if it indicates so through Request Retry Status
> +	* completions.

This is not intentional. I will add it back in the next version.

To summarize the v4 patch would look something like below. Only compile
tested but I will run real testing later today. I think it now includes
the 1s optimization and also checking of the active link reporting
support for the devices behind slow links. Let me know is I missed
something. 

It is getting rather complex unfortunately :(

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 61bf8a4b2099..f81a9e6aff84 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -64,6 +64,13 @@ struct pci_pme_device {
 
 #define PME_TIMEOUT 1000 /* How long between PME checks */
 
+/*
+ * Following exit from Conventional Reset, devices must be ready within 1 sec
+ * (PCIe r6.0 sec 6.6.1).  A D3cold to D0 transition implies a Conventional
+ * Reset (PCIe r6.0 sec 5.8).
+ */
+#define PCI_RESET_WAIT 1000 /* msec */
+
 /*
  * Devices may extend the 1 sec period through Request Retry Status
  * completions (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper
@@ -5010,13 +5017,11 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 	 * speeds (gen3) we need to wait first for the data link layer to
 	 * become active.
 	 *
-	 * However, 100 ms is the minimum and the PCIe spec says the
-	 * software must allow at least 1s before it can determine that the
-	 * device that did not respond is a broken device. There is
-	 * evidence that 100 ms is not always enough, for example certain
-	 * Titan Ridge xHCI controller does not always respond to
-	 * configuration requests if we only wait for 100 ms (see
-	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
+	 * However, 100 ms is the minimum and the PCIe spec says the software
+	 * must allow at least 1s before it can determine that the device that
+	 * did not respond is a broken device. Also device can take longer than
+	 * that to respond if it indicates so through Request Retry Status
+	 * completions.
 	 *
 	 * Therefore we wait for 100 ms and check for the device presence
 	 * until the timeout expires.
@@ -5027,30 +5032,29 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
 		msleep(delay);
-	} else {
-		pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
-			delay);
-		if (!pcie_wait_for_link_delay(dev, true, delay)) {
-			/* Did not train, no need to wait any further */
-			pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
-			return -ENOTTY;
+
+		/*
+		 * If the port supports active link reporting we now check one
+		 * more time if the link is active and if not bail out early
+		 * with the assumption that the device is not present anymore.
+		 */
+		if (dev->link_active_reporting) {
+			u16 status;
+
+			pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
+			if (!(status & PCI_EXP_LNKSTA_DLLLA))
+				return -ENOTTY;
 		}
-	}
 
-	/*
-	 * Everything above is handling the delays mandated by the PCIe r6.0
-	 * sec 6.6.1.
-	 *
-	 * If the port supports active link reporting we now check one more
-	 * time if the link is active and if not bail out early with the
-	 * assumption that the device is not present anymore.
-	 */
-	if (dev->link_active_reporting) {
-		u16 status;
+		return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay);
+	}
 
-		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
-		if (!(status & PCI_EXP_LNKSTA_DLLLA))
-			return -ENOTTY;
+	pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
+		delay);
+	if (!pcie_wait_for_link_delay(dev, true, delay)) {
+		/* Did not train, no need to wait any further */
+		pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
+		return -ENOTTY;
 	}
 
 	return pci_dev_wait(child, reset_type,
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 022da58afb33..f2d3aeab91f4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -64,13 +64,6 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
 #define PCI_PM_D3HOT_WAIT       10	/* msec */
 #define PCI_PM_D3COLD_WAIT      100	/* msec */
 
-/*
- * Following exit from Conventional Reset, devices must be ready within 1 sec
- * (PCIe r6.0 sec 6.6.1).  A D3cold to D0 transition implies a Conventional
- * Reset (PCIe r6.0 sec 5.8).
- */
-#define PCI_RESET_WAIT		1000	/* msec */
-
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
 void pci_refresh_power_state(struct pci_dev *dev);
 int pci_power_up(struct pci_dev *dev);