On Friday 25 February 2022 14:12:30 Marcel Menzel wrote: > Am 24.02.2022 um 18:21 schrieb Pali Rohár: > > On Thursday 24 February 2022 10:25:32 Bjorn Helgaas wrote: > > > On Thu, Feb 24, 2022 at 05:00:30PM +0100, Marcel Menzel wrote: > > > > +linux-pci > > > > > > > > Am 24.02.2022 um 14:52 schrieb Marcel Menzel: > > > > > Am 24.02.2022 um 14:09 schrieb Marcel Menzel: > > > > > > Hello, > > > > > > > > > > > > When upgrading from kernel 5.16.2 to a newer version (tried 5.16.3 > > > > > > and 5.16.10 with unchanged .config), the Kernel fails to detect both > > > > > > my installed mPCIe WiFi cards in my Turris Omnia (newer version, > > > > > > silver case, GPIO pins installed again). > > > > > > I have two Mediatek MT7915 based cards installed. I also tried with > > > > > > one Atheros at9k and one ath10k based card, yielding the same > > > > > > result. On a Kernel version newer than 5.16.2, all cards aren't > > > > > > getting recognized correctly. > > > > > > > > > > > > Before 5.16.3 I also had to disable PCIe ASPM via boot aragument, > > > > > > otherwise the WiFi drivers would complain about weird device > > > > > > behaviors and failing to initialize them, but re-enabling it does > > > > > > not yield any different results. > > > Please try this commit, which is headed to mainline today: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=for-linus&id=c49ae619905eebd3f54598a84e4cd2bd58ba8fe9 > > > > > > This commit should fix the PCI enumeration problem. > > It should fix that regression. If not, please let me know. > Can confirm this patch solving the issue. Many thanks! Perfect! > > > If you still have > > > to disable ASPM, that sounds like a separate problem that we should > > > also try to debug. > > This is different and known issue and **not** related to ASPM. I spend > > some time on it, initially I thought it is bug in Atheros cards, but now > > I'm in impression that this is issue in Marvell PCIe HW that link > > retraining (required step of ASPM) triggers either Link Down or Hot > > Reset which triggers another Atheros issue (this one is already > > documented in kernel pci quirks code). > > > > I will try to implement some workaround for this but requirement is to > > have all new improvements in pci-mvebu.c + pci-aardvark.c drivers... and > > review process is slow. So it would not be before all those changes are > > reviewed and merged. > Removing "pcie_aspm=off" works for my MT7915E based cards, having had no > issues so far. So it doesn't seem to be an issue with the Marvell hardware > itself at least. That is probably because MT7915E card does not trigger that issue. But I think issue is really in Marvell hardware. > Regarding Atheros cards: I disabled it back then for my Atheros AR9582 & > QCA9880 cards and never re-enabled it when I switched to the MT7915E cards, > which I forgot to mention in my first mail, sorry! > I put those two cards back into the device to test it, and the same problem > occurs why I disabled it back then. The router completely freezes while > booting with this as the last log lines (gathered via serial): > > [ 10.400986] ath9k 0000:02:00.0: can't change power state from D3cold to > D0 (config space inaccessible) > [ 10.466924] ath10k_pci 0000:03:00.0: can't change power state from D3cold > to D0 (config space inaccessible) > [ 10.613847] ath10k_pci 0000:03:00.0: failed to wake up device : -110 At this stage there is no link with the card. But kernel does not know it as there is missing implementation for DLLSC interrupt in pci-mvebu.c driver. We need DLLSC support for debugging this issue. For another Marvell driver (pci-aardvark.c) there is already pending patch for review which adds DLLSC interrupt support: https://lore.kernel.org/linux-pci/20220220193346.23789-9-kabel@xxxxxxxxxx/ So on Armada 3720 platforms it is possible to start debugging it. I have (experimental) DLLSC support prepared also for pci-mvebu.c but it depends on summary interrupt which is in missing in irq-armada-370-xp.c: https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=pci-mvebu So without that summary interrupt in irq-armada-370-xp.c driver it is not possible to get information about it in pci-mvebu.c driver. > [ 10.622944] usb 1-1: New USB device found, idVendor=0cf3, idProduct=3004, > bcdDevice= 0.02 > [ 10.635092] usb 1-1: New USB device strings: Mfr=0, Product=0, > SerialNumber=0 > [ 10.659930] ath10k_pci: probe of 0000:03:00.0 failed with error -110 > > This seems to be another topic however. I'd be glad to test and try to debug > fixes and / or gather additional information on my hardware regarding this > problem.