Re: [PATCH] PCI: Add a quirk to skip 1000 ms default link activation delay on some devices

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Wed, 9 Sep 2020 20:00:26 -0500

On Mon, Sep 07, 2020 at 11:43:49AM +0300, Mika Westerberg wrote:
> On Thu, Sep 03, 2020 at 01:11:22PM -0500, Bjorn Helgaas wrote:
> > On Mon, Aug 31, 2020 at 12:31:47PM +0300, Mika Westerberg wrote:
> > > Kai-Heng Feng reported that it takes a long time (> 1 s) to resume
> > > Thunderbolt-connected devices from both runtime suspend and system sleep
> > > (s2idle).
> > > 
> > > This was because some Downstream Ports that support > 5 GT/s do not also
> > > support Data Link Layer Link Active reporting.  Per PCIe r5.0 sec 6.6.1:
> > > 
> > >   With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
> > >   software must wait a minimum of 100 ms after Link training completes
> > >   before sending a Configuration Request to the device immediately below
> > >   that Port. Software can determine when Link training completes by
> > >   polling the Data Link Layer Link Active bit or by setting up an
> > >   associated interrupt (see Section 6.7.3.3).
> > > 
> > > Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting,
> > > but at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and
> > > Intel JHL7540 Thunderbolt 3 Bridge [8086:15e7, 8086:15ea, 8086:15ef] do
> > > not.
> > 
> > Is there any erratum about this?  I'm just hoping to avoid the
> > maintenance hassle of adding new devices to the quirk.  If Intel
> > acknowledges this as a defect and has a plan to fix it, that would
> > help a lot.  If they *don't* think it's a defect, then maybe they have
> > a hint about how we should handle this generically.
> 
> I don't think there is any public documentation about these chips so
> probably no errata either. I did ask our TBT HW folks about this but so
> far did not get any answer.

Huh.  AFAICT this is a non-fatal issue -- the only problem is that
resume takes longer than it should.  The fix is somewhat ugly, both
because we have to maintain a list of affected devices, and because it
clutters a generic code path that is already quite complicated.

That's all to say that I'm not very happy about this and am not in a
huge hurry to apply it.  Intel is usually pretty good about following
the PCIe spec and documenting issues when they occur.  For some reason
TBT seems like an exception.

I don't maintain the TBT-specific stuff, so I personally don't care
all that much about that.  But this issue is plain PCIe, nothing to do
with TBT.  Kai-Heng, you, and I have all spent a lot time trying to
figure this out, and it makes me sad that Intel isn't giving us any
help.

Can you please ask them again?

Bjorn