Re: PCI trouble on mvebu (Turris Omnia)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29/10/2020 20:30, Bjorn Helgaas wrote:
On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
Pali Rohár <pali@xxxxxxxxxx> writes:
I have been testing mainline kernel on Turris Omnia with two PCIe
default cards (WLE200 and WLE900) and it worked fine. But I do not know
if I had ASPM enabled or not.

So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
issue is only when CONFIG_PCIEASPM is enabled?
Yup, exactly. And I'm also currently testing with the default WLE200/900
cards... I just tried sticking an MT76-based WiFi card into the third
PCI slot, and that doesn't come up either when I enable PCIEASPM.
Huh.  So IIUC, the following cases all try to retrain the link and it
fails to come up again:

   - aardvark + WLE900VX (see commit 43fc679ced18)
   - mvebu + WLE200
   - mvebu + WLE900
   - mvebu + MT76

In all these cases, Linux was able to enumerate the NIC, which means
the link was up when firmware handed it off.

I think Linux decided the Common Clock Configuration was wrong, so it
tried to fix it and retrain the link, and the link didn't come back
up.

I don't have "lspci -vv" output from all of them, but in vtolkm's
case, the firmware handed off with:

   00:02.0 Root Port to [bus 02]  SlotClk+ CommClk+
   02:00.0 QCA986x/988x NIC       SlotClk+ CommClk-

Per spec (PCIe r5, sec 7.5.3.7), SlotClk is HwInit and CommClk is RW
and should power up as 0.  If I'm reading the implementation note
correctly, if SlotClk is set on both ends of the link, software should
set CommClk, so the config above *does* look wrong, and CommClk+ on
the Root Port suggests that firmware set it.

I think both the aardvark and mvebu systems probably use U-Boot.  I
don't know U-Boot at all, but I don't see anything in it that touches
Link Control.  I'm curious what happens if you put one of these cards
in a PC.  If anybody tries it, please collect the "sudo lspci -vv" and
dmesg output.

We could quirk these NICs to avoid the retrain, but since aardvark and
mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
obvious connection, I doubt there's a simple hardware defect that
explains all these.

Maybe we're doing something wrong in the retrain, but obviously the
link came up in the first place.  AFAIK the only thing we're changing
is the CommClk setting, and that looks legitimate per spec.

Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
and $NIC appropriately, and try the following:

   # Set $ROOT and $NIC (update to match your system):

     # ROOT=00:02.0
     # NIC=02:00.0

   # Dump the Root Port and NIC Link registers:

     # setpci -s$ROOT CAP_EXP+0xc.l              # Link Capabilities
     # setpci -s$ROOT CAP_EXP+0x10.w             # Link Control
     # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status

     # setpci -s$NIC  CAP_EXP+0xc.l              # Link Capabilities
     # setpci -s$NIC  CAP_EXP+0x10.w             # Link Control
     # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

   # Retrain the link:

     # setpci -s$ROOT CAP_EXP+0x10.w=0x0020      # Link Control Retrain Link
     # sleep 1
     # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
     # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

   # Set CommClk+ and retrain the link:

     # setpci -s$NIC  CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
     # setpci -s$ROOT CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
     # setpci -s$ROOT CAP_EXP+0x10.w=0x0060      # Link Control RL + CC
     # sleep 1
     # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
     # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

ROOT=00:02.0
NIC=02:00.0
setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
setpci -s$ROOT CAP_EXP+0x10.w
0040
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC  CAP_EXP+0xc.l

00036c11
setpci -s$NIC  CAP_EXP+0x10.w
0000
setpci -s$NIC  CAP_EXP+0x12.w
1011
setpci -s$ROOT CAP_EXP+0x10.w=0x0020
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC  CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there are no capabilities with that id.
setpci -s$NIC  CAP_EXP+0x10.w=0x0040
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there are no capabilities with that id.
setpci -s$ROOT CAP_EXP+0x10.w=0x0040
setpci -s$ROOT CAP_EXP+0x10.w=0x0060
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1811
setpci -s$NIC  CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there are no capabilities with that id.

Attachment: OpenPGP_0x729CFF47A416598B.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux