On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote: > Attempt to handle cases such as with a downstream port of the ASMedia > ASM2824 PCIe switch where link training never completes and the link > continues switching between speeds indefinitely with the data link layer > never reaching the active state. We're going to land this series this cycle, come hell or high water. We talked about reusing pcie_retrain_link() earlier. IIRC that didn't work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA because the erratum makes PCI_EXP_LNKSTA_LT flap. What if we made pcie_retrain_link() reusable by making it: bool pcie_retrain_link(struct pci_dev *pdev, u16 link_status_bit) so ASPM could use pcie_retrain_link(link->pdev, PCI_EXP_LNKSTA_LT) and you could use pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA)? Maybe do it two steps? 1) Move pcie_retrain_link() just after pcie_wait_for_link() and make it take link->pdev instead of link. 2) Add the bit parameter. I'm OK with having pcie_retrain_link() in pci.c, but the surrounding logic about restricting to 2.5GT/s, retraining, removing the restriction, retraining again is stuff I'd rather have in quirks.c so it doesn't clutter pci.c. I think it'd be good if the pci_device_add() path made clear that this is a workaround for a problem, e.g., void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) { ... if (pcie_link_failed(dev)) pcie_fix_link_train(dev); where pcie_fix_link_train() could live in quirks.c (with a stub when CONFIG_PCI_QUIRKS isn't enabled). It *might* even be worth adding it and the stub first because that's a trivial patch and wouldn't clutter the probe.c git history with all the grotty details about ASM2824 and this topology. > +int pcie_downstream_link_retrain(struct pci_dev *dev) > +{ > + static const struct pci_device_id ids[] = { > + { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */ > + {} > + }; > + u16 lnksta, lnkctl2; > + > + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > + return -1; > + > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) == > + PCI_EXP_LNKSTA_LBMS) { You go to some trouble to make sure PCI_EXP_LNKSTA_LBMS is set, and I can't remember what the reason is. If you make a preparatory patch like this, it would give a place for that background, e.g., +bool pcie_link_failed(struct pci_dev *dev) +{ + u16 lnksta; + + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) + return false; + + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) == + PCI_EXP_LNKSTA_LBMS) + return true; + + return false; +} If this is a generic thing and checking PCI_EXP_LNKSTA_LBMS makes sense for everybody, it could go in pci.c; otherwise it could go in quirks.c as well. I guess it's not *truly* generic anyway because it only detects link training failures for devices that have LNKCTL2 and link_active_reporting. > + unsigned long timeout; > + u16 lnkctl; > + > + pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n"); > + > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl); > + lnkctl |= PCI_EXP_LNKCTL_RL; > + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; > + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT; > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl); > + /* > + * Due to an erratum in some devices the Retrain Link bit > + * needs to be cleared again manually to allow the link > + * training to succeed. > + */ > + lnkctl &= ~PCI_EXP_LNKCTL_RL; > + if (dev->clear_retrain_link) > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, > + lnkctl); > + > + timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT; > + do { > + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, > + &lnksta); > + if (lnksta & PCI_EXP_LNKSTA_DLLLA) > + break; > + usleep_range(10000, 20000); > + } while (time_before(jiffies, timeout)); > + > + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) { > + pci_info(dev, "retraining failed\n"); > + return -1; > + } > + } > + if (IS_ENABLED(CONFIG_PCI_QUIRKS) && (lnksta & PCI_EXP_LNKSTA_DLLLA) && > + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT && > + pci_match_id(ids, dev)) { > + u32 lnkcap; > + u16 lnkctl; > + > + pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n"); > + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl); > + lnkctl |= PCI_EXP_LNKCTL_RL; > + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; > + lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS; > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); > + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl); This starts a retrain; should we wait for training to complete? > + } If we put most of this into a pcie_fix_link_train() (separated from detecting the *need* to fix something), could it be made to look sort of like this? (I suppose you'd want to return bool and rename it that reads naturally, e.g., "pcie_link_forcibly_retrained()", "pcie_link_retrained()", etc) +void pcie_fix_link_train(struct pci_dev *dev) +{ + u16 lnkctl2; + u32 lnkcap; + bool linkup; + + pci_info(dev, "attempting link retrain at 2.5GT/s\n"); + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT; + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); + + linkup = pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA); + if (!linkup) { + pci_info(dev, "retraining failed\n"); + return; + } + + if (LNKCAP supports only 2.5GT/s) + return; + + if (!pci_match_id(ids, dev)) + return; Your comment said "if we know this is *safe*"; I can't remember if pci_match_id() is there to avoid a known problem? + + pci_info(dev, "attempting link retrain at max supported rate\n"); + pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS; + lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS; + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); + + linkup = pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA); + if (!linkup) + pci_info(dev, "retraining failed\n"); +} > + > + return 0; > +} > + > +/* Same as above, but called for a downstream device. */ > +static int pcie_upstream_link_retrain(struct pci_dev *dev) > +{ > + struct pci_dev *bridge; > + > + bridge = pci_upstream_bridge(dev); > + if (bridge) > + return pcie_downstream_link_retrain(bridge); > + else > + return -1; > +} > + > static int pci_acs_enable; > > /** > @@ -1148,8 +1274,8 @@ void pci_resume_bus(struct pci_bus *bus) > > static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) > { > + int retrain = 0; > int delay = 1; > - u32 id; > > /* > * After reset, the device should not silently discard config > @@ -1163,21 +1289,37 @@ static int pci_dev_wait(struct pci_dev * > * Command register instead of Vendor ID so we don't have to > * contend with the CRS SV value. > */ > - pci_read_config_dword(dev, PCI_COMMAND, &id); > - while (PCI_POSSIBLE_ERROR(id)) { > + for (;;) { > + u32 id; > + > + pci_read_config_dword(dev, PCI_COMMAND, &id); > + if (!PCI_POSSIBLE_ERROR(id)) { > + if (delay > PCI_RESET_WAIT) > + pci_info(dev, "ready %dms after %s\n", > + delay - 1, reset_type); > + break; > + } > + > if (delay > timeout) { > pci_warn(dev, "not ready %dms after %s; giving up\n", > delay - 1, reset_type); > return -ENOTTY; > } > > - if (delay > PCI_RESET_WAIT) > + if (delay > PCI_RESET_WAIT) { > + if (!retrain) { > + retrain = 1; > + if (pcie_upstream_link_retrain(dev) == 0) { > + delay = 1; > + continue; > + } > + } > pci_info(dev, "not ready %dms after %s; waiting\n", > delay - 1, reset_type); > + } Thanks for fixing this in the reset path, too. Can we move this part to a separate patch? It's related to the rest of the patch, but it looks so much different that I think it would be easier to understand by itself. I think I might try to fold the pcie_upstream_link_retrain() directly in here because the "upstream link retrain" in the function name doesn't really make sense in PCIe terms. Bjorn