"™֟☻̭҇ Ѽ ҉ ®" <vtolkm@xxxxxxxxxxxxxx> writes: > On 02/11/2020 16:54, Toke Høiland-Jørgensen wrote: >> Pali Rohár <pali@xxxxxxxxxx> writes: >> >>> On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote: >>>> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm@xxxxxxxxxxxxxx> writes: >>>> >>>>> On 30/10/2020 15:23, Pali Rohár wrote: >>>>>> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote: >>>>>>> Pali Rohár <pali@xxxxxxxxxx> writes: >>>>>>>> My experience with that WLE900VX card, aardvark driver and aspm code: >>>>>>>> >>>>>>>> Link training in GEN2 mode for this card succeed only once after reset. >>>>>>>> Repeated link retraining fails and it fails even when aardvark is >>>>>>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have >>>>>>>> working link training. >>>>>>>> >>>>>>>> What I did in aardvark driver: Set mode to GEN2, do link training. If >>>>>>>> success read "negotiated link speed" from "Link Control Status Register" >>>>>>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then >>>>>>>> retrain link again (for WLE900VX now it would be at GEN1). After that >>>>>>>> card is stable and all future retraining (e.g. from aspm.c) also passes. >>>>>>>> >>>>>>>> If I do not change aardvark mode from GEN2 to GEN1 the second link >>>>>>>> training fails. And if I change mode to GEN1 after this failed link >>>>>>>> training then nothing happen, link training do not success. >>>>>>>> >>>>>>>> So just speculation now... In current setup initialization of card does >>>>>>>> one link training at GEN2. Then aspm.c is called which is doing second >>>>>>>> link retraining at GEN2. And if it fails then below patch issue third >>>>>>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark >>>>>>>> then second link retraining must be at GEN1 (not GEN2) to workaround >>>>>>>> this issue. >>>>>>>> >>>>>>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link >>>>>>>> retraining at GEN2 speed? And always force GEN1 speed prior link >>>>>>>> training? >>>>>>> Sounds like a plan. I poked around in aspm.c and must confess to being a >>>>>>> bit lost in the soup of registers ;) >>>>>>> >>>>>>> So if one of you can cook up a patch, that would be most helpful! >>>>>> I modified Bjorn's patch, explicitly set tls to 1 and added debug info >>>>>> about cls (current link speed, that what is used by aardvark). It is >>>>>> untested, I just tried to compile it. >>>>>> >>>>>> Can try it? >>>>>> >>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c >>>>>> index 253c30cc1967..f934c0b52f41 100644 >>>>>> --- a/drivers/pci/pcie/aspm.c >>>>>> +++ b/drivers/pci/pcie/aspm.c >>>>>> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link) >>>>>> unsigned long end_jiffies; >>>>>> u16 reg16; >>>>>> >>>>>> + u32 lnkcap2; >>>>>> + u16 lnksta, lnkctl2, cls, tls; >>>>>> + >>>>>> + pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2); >>>>>> + pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta); >>>>>> + pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2); >>>>>> + cls = lnksta & PCI_EXP_LNKSTA_CLS; >>>>>> + tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS; >>>>>> + >>>>>> + pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n", >>>>>> + lnkcap2, (lnkcap2 & 0x3F) >> 1, >>>>>> + lnksta, cls, >>>>>> + lnkctl2, tls); >>>>>> + >>>>>> + tls = 1; >>>>>> + pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2, >>>>>> + PCI_EXP_LNKCTL2_TLS, tls); >>>>>> + pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2); >>>>>> + pci_info(parent, "lnkctl2 %#010x new tls %#03x\n", >>>>>> + lnkctl2, tls); >>>>>> + >>>>>> pcie_capability_read_word(parent, PCI_EXP_LNKCTL, ®16); >>>>>> reg16 |= PCI_EXP_LNKCTL_RL; >>>>>> pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16); >>>>>> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link) >>>>>> break; >>>>>> msleep(1); >>>>>> } while (time_before(jiffies, end_jiffies)); >>>>>> + pci_info(parent, "lnksta %#06x new cls %#03x\n", >>>>>> + lnksta, (cls & PCI_EXP_LNKSTA_CLS)); >>>>>> return !(reg16 & PCI_EXP_LNKSTA_LT); >>>>>> } >>>>>> >>>>> Still exhibiting the BAR update error, run tested with next--20201030 >>>> Yup, same for me :( >>> So then it is different issue and not similar to aardvark one. >>> >>> Anyway, was ASPM working on some previous kernel version? Or was it >>> always broken on Turris Omnia? >> I tried bisecting and couldn't find a commit that worked. And OpenWrt by >> default builds with ASPM off, so my best guess is that it was always >> broken. >> >> However, the two other PCI slots *do* work with ASPM on, as long as >> they're both occupied when booting. If I only have one card installed >> apart from the dodge WLE900, both of them fail... > > Just to be sure it is not a (particular) mPCIe slot issue on the TO - > did you change the device order in the mPCIe slots? No, I didn't. > On my node: > > - right slot (next to the CPU) hosts a SSD > - centre slot hosts WLE900VX > - left slot (over the SIM card slot) hosts the WLE200N2 That's the same order as the PCI subsystem enumerates the slots (on my machine at least). I have WLE200/WLE900/MT76 in those three slots, which makes slot 1 and 3 work, while slot 2 craps out. If I remove the MT76 card (as it was originally), neither of slots 1 and 2 work... -Toke