Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec

Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> · Wed, 30 Oct 2019 13:33:53 +0200

On Tue, Oct 29, 2019 at 03:54:56PM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > Currently Linux does not follow PCIe spec regarding the required delays
> > after reset. A concrete example is a Thunderbolt add-in-card that
> > consists of a PCIe switch and two PCIe endpoints:
> > ...
> 
> > @@ -1025,15 +1025,11 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
> >  	if (state == PCI_D0) {
> >  		pci_platform_power_transition(dev, PCI_D0);
> >  		/*
> > -		 * Mandatory power management transition delays, see
> > -		 * PCI Express Base Specification Revision 2.0 Section
> > -		 * 6.6.1: Conventional Reset.  Do not delay for
> > -		 * devices powered on/off by corresponding bridge,
> > -		 * because have already delayed for the bridge.
> > +		 * Mandatory power management transition delays are handled
> > +		 * in pci_pm_runtime_resume() of the corresponding
> > +		 * downstream/root port.
> >  		 */
> >  		if (dev->runtime_d3cold) {
> > -			if (dev->d3cold_delay && !dev->imm_ready)
> > -				msleep(dev->d3cold_delay);
> 
> This removes the only use of d3cold_delay.  I assume that's
> intentional?  If we no longer need it, we might as well remove it from
> the pci_dev and remove the places that set it.  It'd be nice if that
> could be a separate patch, even if we waited a little longer than
> necessary at that one bisection point.

Yes, it is intentional. In the previous version I had function
pcie_get_downstream_delay() that used both d3cold_delay and imm_ready to
calculate the downstream device delay but you said:

  I'm not sold on the idea that this delay depends on what's *below* the                                                                                                   
  bridge.  We're using sec 6.6.1 to justify the delay, and that section                                                                                               
  doesn't say anything about downstream devices.

So I dropped it and use 100 ms instead.

Now that you mention, I think if we want to continue support that _DSM,
we should still take d3cold_delay into account in this patch. There is
also one driver (drivers/mfd/intel-lpss-pci.c) that sets it to 0.

> It also removes one of the three uses of imm_ready, leaving only the
> two in FLR.  I suspect there are other places we should use imm_ready,
> e.g., transitions to/from D1 and D2, but that would be beyond the
> scope of this patch.

Right, I think imm_ready does not apply here. If I understand correctly
it is exactly for D1, D2 and D3hot transitions which we should take into
account in pci_dev_d3_sleep() (which we don't do right now).

> > +	/*
> > +	 * For PCIe downstream and root ports that do not support speeds
> > +	 * greater than 5 GT/s need to wait minimum 100 ms. For higher
> > +	 * speeds (gen3) we need to wait first for the data link layer to
> > +	 * become active.
> > +	 *
> > +	 * However, 100 ms is the minimum and the PCIe spec says the
> > +	 * software must allow at least 1s before it can determine that the
> > +	 * device that did not respond is a broken device. There is
> > +	 * evidence that 100 ms is not always enough, for example certain
> > +	 * Titan Ridge xHCI controller does not always respond to
> > +	 * configuration requests if we only wait for 100 ms (see
> > +	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
> > +	 *
> > +	 * Therefore we wait for 100 ms and check for the device presence.
> > +	 * If it is still not present give it an additional 100 ms.
> > +	 */
> > +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> > +	    pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
> > +		return;
> 
> Shouldn't this be:
> 
>   if (!pcie_downstream_port(dev))
>     return
> 
> so we include PCI/PCI-X to PCIe bridges?

Yes, I'll change it in v3.