Re: PCI trouble on mvebu (Turris Omnia)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 29/10/2020 11:41, Toke Høiland-Jørgensen wrote:
Bjorn Helgaas <helgaas@xxxxxxxxxx> writes:

[+cc Pali, Marek, Thomas, Jason]

On Wed, Oct 28, 2020 at 04:40:00PM +0000, ™֟☻̭҇ Ѽ ҉ ® wrote:
On 28/10/2020 16:08, Toke Høiland-Jørgensen wrote:
Bjorn Helgaas <helgaas@xxxxxxxxxx> writes:
On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
Toke Høiland-Jørgensen <toke@xxxxxxxxxx> writes:
Bjorn Helgaas <helgaas@xxxxxxxxxx> writes:

[+cc vtolkm]

On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
Hi everyone

I'm trying to get a mainline kernel to run on my Turris Omnia, and am
having some trouble getting the PCI bus to work correctly. Specifically,
I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
the resource request fix[0] applied on top.

The kernel boots fine, and the patch in [0] makes the PCI devices show
up. But I'm still getting initialisation errors like these:

[    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)

and the WiFi drivers fail to initialise with what appears to me to be
errors related to the bus rather than to the drivers themselves:

[    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.524473] ath9k 0000:01:00.0: Failed to initialize device
[    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
[    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110

lspci looks OK, though:

# lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)

Does anyone have any clue what could be going on here? Is this a bug, or
did I miss something in my config or other initialisation? I've tried
with both the stock u-boot distributed with the board, and with an
upstream u-boot from latest master; doesn't seem to make any different.
Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
don't think we have a fix yet.
Yes! Turning that off does indeed help! Thanks a bunch :)

You mention that bisecting this would be helpful - I can try that
tomorrow; any idea when this was last working?
OK, so I tried to bisect this, but, erm, I couldn't find a working
revision to start from? I went all the way back to 4.10 (which is the
first version to include the device tree file for the Omnia), and even
on that, the wireless cards were failing to initialise with ASPM
enabled...
I have no personal experience with this device; all I know is that the
bugzilla suggests that it worked in v5.4, which isn't much help.

Possibly the apparent regression was really a .config change, i.e.,
CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
"worked" but got enabled later and it started failing?
Yeah, I suspect so. The OpenWrt config disables CONFIG_PCIEASPM by
default and only turns it on for specific targets. So I guess that it's
most likely that this has never worked...

Maybe the debug patch below would be worth trying to see if it makes
any difference?  If it *does* help, try omitting the first hunk to see
if we just need to apply the quirk_enable_clear_retrain_link() quirk.
Tried, doesn't help...

-Toke
Found this patch

https://github.com/openwrt/openwrt/blob/7c0496f29bed87326f1bf591ca25ace82373cfc7/target/linux/mvebu/patches-5.4/405-PCI-aardvark-Improve-link-training.patch

that mentions the Compex WLE900VX card, which reading the lspci verbose
output from the bugtracker seems to the device being troubled.
Interesting.  Indeed, the Compex WLE900VX card seems to have the
Qualcomm Atheros QCA9880 on it, and it looks like Toke's system has
the same device in it.

The patch you mention (https://git.kernel.org/linus/43fc679ced18) is
for aardvark, so of course doesn't help mvebu.

PCIe hardware is supposed to automatically negotiate the highest link
speed supported by both ends.  But software *is* allowed to set an
upper limit (the Target Link Speed in Link Control 2).  If we initiate
a retrain and the link doesn't come back up, I wonder if we should try
to help the hardware out by using Target Link Speed to limit to a
lower speed and attempting another retrain, something like this hacky
patch: (please collect the dmesg log if you try this)
Well, I tried it, but don't see any of the 'lnkcap2' output from that
new function:

[    1.545853] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[    1.545878] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[    1.545894] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[    1.545907] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[    1.545920] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[    1.545933] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.545945] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.545958] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.545970] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.545982] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.545994] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.546006] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.546014] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.546181] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[    1.546190] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.546197] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[    1.546204] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[    1.546210] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[    1.546216] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[    1.546220] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[    1.546225] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
[    1.546294] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[    1.546308] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.546482] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[    1.546495] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.546643] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[    1.546656] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.547379] PCI: bus0: Fast back to back transfers disabled
[    1.547387] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547394] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547402] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.547484] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[    1.547507] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[    1.547615] pci 0000:01:00.0: supports D1
[    1.547620] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[    1.547730] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.631937] PCI: bus2: Fast back to back transfers enabled
[    1.631945] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[    1.632655] PCI: bus3: Fast back to back transfers enabled
[    1.632662] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.632694] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[    1.632702] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[    1.632710] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[    1.632718] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[    1.632726] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0600000-0xe06007ff pref]
[    1.632734] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[    1.632741] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.632746] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632752] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.632760] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.632769] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.632776] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
[    1.632782] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.632788] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.632793] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.632800] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.632807] pci 0000:00:03.0: PCI bridge to [bus 03]

(and then later, still):
[    3.476364] pci 0000:00:01.0: enabling device (0140 -> 0142)
[    3.477542] ata1: SATA link down (SStatus 0 SControl 300)
[    3.482126] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
[    3.487487] ata2: SATA link down (SStatus 0 SControl 300)
[    3.493379] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
[    3.505891] ath: phy0: Unable to initialize hardware; initialization status: -95
[    3.513325] ath9k 0000:01:00.0: Failed to initialize device
[    3.518933] ath9k: probe of 0000:01:00.0 failed with error -95
[    3.524862] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
[    3.531904] pci 0000:00:02.0: enabling device (0140 -> 0142)
[    3.537590] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[    3.577436] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    3.583948] ath10k_pci: probe of 0000:02:00.0 failed with error -110


-Toke


Same result my end - run tested with next-20201027

N.B. node does not boot anymore with next-20201028, but that that is independent of this patch and apparently another issue.

Attachment: OpenPGP_0x729CFF47A416598B.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux