Re: PCI: Work around PCIe link training failures

"Maciej W. Rozycki" <macro@xxxxxxxxxxx> · Wed, 7 Aug 2024 12:14:13 +0100 (BST)

On Wed, 7 Aug 2024, Matthew W Carlis wrote:

> > it does seem like this series made wASMedia ASM2824 work better but
> > caused regressions elsewhere, so maybe we just need to accept that
> > ASM2824 is slightly broken and doesn't work as well as it should.
> 
> One of my colleagues challenged me to provide a more concrete example
> where the change will cause problems. One such configuration would be not
> implementing the Power Controller Control in the Slot Capabilities Register.
> Then, Powering off the slot via out-of-band interfaces would result in the
> kernel forcing the DSP to Gen1 100% of the time as far as I can tell. 
> The aspect of this force to Gen1 that is the most concerning to my team is
> that it isn't cleaned up even if we replaced the EP with some other EP.

 Why does that happen?

 For the quirk to trigger, the link has to be down and there has to be the 
LBMS Link Status bit set from link management events as per the PCIe spec 
while the link was previously up, and then both of that while rescanning 
the PCIe device in question, so there's a lot of conditions to meet.  Is 
it the case that in your setup there is no device at this point, but one 
gets plugged in later?

 One aspect to mention here is that when taking a device offline the LBMS 
Link Status bit really ought to be cleared by Linux in the corresponding 
downstream port of the parent device.  As I recall Ilpo was working on a 
broader link bandwidth management subsystem for Linux, which would do that 
among others.  He asked me to make experiments with my problematic machine 
to see if this would interfere, but unrelated issues with DRAM controller 
(now fixed by reducing the DDR clock rate) have prevented me from doing 
that.  I'll see if I can get back to that soon.

> I was curious about the PCIe devices mentioned in the commit because I
> look at crazy malfunctioning devices too often so I pasted the following:
> "Delock Riser Card PCI Expres 41433" into Google. 
> I'm not really a physical layer guy, but is it possible that the reported
> issue be due to signal integrity? I'm not sure if sending PCIe over a USB
> cable is "reliable".

 Well, it's a transmission line and as long as bandwidth and latency 
requirements are met it's as good as any.  My understanding has been one 
of the objectives of PCIe over conventional PCI was to make external links 
such as to expansion boxes easier to implement.

 Please note that it is a purpose-made cable too, rather than just an 
off-the-shelf USB cable.  Then I have solutions deployed with PCIe routed 
over DVI cables too.

 I doubt it could be a signal integrity issue, because once the link has 
negotiated the highest speed possible (Gen2) via this complicated dance it 
works with no issues for months.  I'm not sure what the highest uptime for 
the system in question exactly was, but it was in the range of half a 
year, and I have a network interface downstream which I regularly use for 
heavy NFS traffic in GNU tool chain regression verification, so any issue 
would have shown up pretty quickly.

 Given how switching between speed rates works with PCIe (by establishing 
a link at 2.5GT/s first and then exchanging rates available as data before 
choosing the highest supported by both endpoints) I suspect that it is a 
protocol issue: either or both devices have got it slightly wrong, which 
breaks it when they're combined together.  Otherwise why would retraining 
to 5GT/s by hand work while it doesn't if to be done by hardware itself?  
There is not much if any difference here between both scenarios really.

> I've never worked with an ASMedia switch and don't have a reliable way to
> reproduce anything like the interaction between the two device at hand. As
> much as I hate to make the request my thinking is that the patch should be
> reverted until there is a solution that doesn't leave the link forced to
> Gen1 forever for every EP thereafter.

 I'm working on such a change this week.

 It's just that my primary objectives for this maintenance visit at my lab 
were to fix a pair of broken PSUs (which I have now done) and upgrade the 
firmware of a console manager device to fix an issue with a remote serial 
BREAK feature affecting Magic SysRq (also completed), plus I have a day 
job too that is unrelated.

 I also bought a PCIe to dual M.2 M-Key option card with the ASM2824 
switch onboard to have a different setup to evaluate and determine if this 
issue is specific to the RISC-V board or not.  Unfortunately the ASM2824 
does not bring the downstream ports up if the card is placed in a slot 
that does not supply Vaux (which is a conforming arrangement according to 
the PCIe spec), apparently due to a quirk in the ASM2824 switch according 
to the option card manufacturer, and there is no software workaround 
possible.

 So I won't be able to use this alternative arrangement until I have 
modified the option card, which I won't be able to do before October the 
earliest.  

 I just can't help with that there is so many broken hardware designs out 
there and I have to strive to navigate through and do my job regardless.

  Maciej