[PATCH 0/2] PCI: Rework error reporting with PCIe failed link retraining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 This patch series addresses issues observed by Ilpo as reported here: 
<https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@xxxxxxxxxxxxxxx/>, 
one with excessive delays happening when `pcie_failed_link_retrain' is 
called, but link retraining has not been actually attempted, and another 
one with an API misuse caused by a merge mistake.

 See individual change description for further details; 1/2 supersedes: 
<https://patchwork.kernel.org/project/linux-pci/patch/20240202134108.4096-1-ilpo.jarvinen@xxxxxxxxxxxxxxx/>, 
and 2/2 supersedes: 
<https://patchwork.kernel.org/project/linux-pci/patch/20240208132205.4550-1-ilpo.jarvinen@xxxxxxxxxxxxxxx/>.

 Unfortunately I cannot verify the changes anymore beyond just checking 
that the system `pcie_failed_link_retrain' was intended for still boots, 
because something happened that makes the problematic link not to work at 
all.

 The system was up for 88 days and the link continued working as I was 
logged in over a serial line wired through a PCIe serial option card 
further downstream and I communicated over the line just fine to log out 
in preparation for a reboot.  After reboot the link did not respond and 
after several further attempts, including reboots and power cycles, the 
link still does not respond, LBMS is never set and I couldn't ever observe 
LT being set either.  This affects U-Boot too, as previously it reported:

PCIE-0: Link up (Gen1-x8, Bus0)
PCI Autoconfig: 02.03.00: Downstream link non-functional
PCI Autoconfig: 02.03.00: Retrying with speed restricted to 2.5GT/s...
PCI Autoconfig: 02.03.00: Succeeded!

and now it only reports:

PCIE-0: Link up (Gen1-x8, Bus0)

 Interestingly enough the system had its mainboard replaced those 3 months 
ago to deal with an unrelated problem, and with the new mainboard in place 
I already had issues with the option cards downstream from the PCIe switch 
immediately wired to 02.03.0.  I had to rewire and reseat the adapter and 
cards several times before it started working reliably.  Maybe something 
has happened to the adapter board with the PCIe switch that caused it to 
stop working, hopefully permanently.  Perhaps it has something to do with 
the power supply connection, which is via an FDC/Berg connector, not my 
favourite one.

 I have four such adapter boards total, so I can try and see if I am able 
to revive the original one or use a replacement one, but it won't happen 
right away, as I have the system installed in a remote lab ~1000mi/1600km 
away from me.  I'll try to bring the system back to fully working order at 
the next opportunity, but it is inconvenient to me to travel there right 
now just to address this problem, so it'll be a couple of weeks and likely 
more before I am able to say something.  I hope it's not the new mainboard 
(PCIe devices in the other slots work just fine).

 Hopefully I'll be able fix it one way or another and will be able to 
report on LBMS behaviour too, that is whether it retriggers with every 
link training iteration or not.

 Meanwhile the patches are hopefully obvious enough to apply.

  Maciej




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux