Re: [PATCH 1/1] PCI: Use the correct bit in Link Training not active check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 1 Mar 2024, Maciej W. Rozycki wrote:

> On Fri, 1 Mar 2024, Ilpo Järvinen wrote:
> 
> > Besides Link Training bit, pcie_retrain_link() can also be asked to
> > wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
> > 'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
> > 'pci/enumeration'").
> 
>  Nope, it was added with commit 680e9c47a229 ("PCI: Add support for 
> polling DLLLA to pcie_retrain_link()"), before said merge.

Ah sorry, my wording was not good here, I meant on the line I was 
changing in the patch and that line didn't exist in 680e9c47a229 at all. 

So yes, DLLLA and use_lt waiting was added in 680e9c47a229 but the merge 
commit brought the implementation note related code into 
pcie_retrain_link() which I think was mismerged when it comes to use_lt.

> > pcie_retrain_link() first tries to ensure Link Training is not
> > currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
> > which must always check Link Training bit regardless of 'use_lt'.
> > Correct the pcie_wait_for_link_status() parameters to only wait for
> > the correct bit to ensure there is no ongoing Link Training.
> 
>  You're talking the PCIe spec here and code is talking a bad device case.
>
> > Since waiting for Data Link Layer Link Active bit is only used for the
> > Target Speed quirk, this only impacts the case when the quirk attempts
> > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > so pcie_retrain_link() will fail).
> 
>  NAK.  It's used for both clamping and unclamping and it will break the 
> workaround, because the whole point there is to wait until DLLA has been 
> set.  Using LT is not reliable because it will oscillate in the failure 
> case and seeing the bit clear does not mean link has been established.  

In pcie_retrain_link(), there are two calls into 
pcie_wait_for_link_status() and the second one of them is meant to 
implement the link-has-been-established check.

The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link 
retraining race") and is just to ensure the link is not ongoing retraining 
to make sure the latest configuration in captured as required by the 
implementation note. LT being cleared is exactly what is wanted for that 
check because it means that any earlier retraining has ended (a new one 
might be starting but that doesn't matter, we saw it cleared so the new 
configuration should be in effect for that instance of link retraining).

So my point is, the first check is not even meant to check that link has 
been established.

>  What are you trying to achieve here and what problem is it to fix?

Actually, I misthought what it breaks so the problem is not well described 
above but I still think it is broken:

In the case of quirk, before 2.5GT/s is attempted DLLLA is not set, 
right? Then quirk sets 2.5GT/s target speed and calls into 
pcie_retrain_link().

The first call into pcie_wait_for_link_status() is made with (..., false, 
true) which waits until DLLLA is set but this occurs before OS even 
triggered Link Retraining. Since there's no retraining commanded by the 
OS, DLLLA will not get set, the wait will fail and error is returned, and 
the quirk too returns failure.

It could of course occur that because of the HW retraining loop 
(independent of OS control), the link retrains itselfs to 2.5GT/s without 
OS asking for it just by OS setting the target speed alone, which is well 
possible given the HW behavior in your target speed quirk case is not 
fully understood. Even if that's the case, it seems not good to rely on 
the HW originating retraining loop triggering the link retraining that
is necessary here.

Maybe this is far fetched thought but perhaps it could explain why you 
didn't get the link up with your setup when you tried to test it earlier.

Alternative approach to fix this problem would be to not make the first 
call into pcie_wait_for_link_status() at all when use_lt = false.

Of course, I cannot test this with your configuration so I cannot 
confirm how the target speed quirk behaves, I just found it by reading the 
code. The current code does not make sense because the first wait is 
supposed to wait for LT bit, not for DLLLA.

-- 
 i.

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux