On Sat, Feb 10, 2024 at 01:43:50AM +0000, Maciej W. Rozycki wrote: > Only return successful completion status from `pcie_failed_link_retrain' > if retraining has actually been done, preventing excessive delays from > being triggered at call sites in a hope that communication will finally > be established with the downstream device where in fact nothing has been > done about the link in question that would justify such a hope. > > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") > Reported-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@xxxxxxxxxxxxxxx/ > Signed-off-by: Maciej W. Rozycki <macro@xxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # v6.5+ > --- > drivers/pci/quirks.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > linux-pcie-failed-link-retrain-status-fix.diff > Index: linux-macro/drivers/pci/quirks.c > =================================================================== > --- linux-macro.orig/drivers/pci/quirks.c > +++ linux-macro/drivers/pci/quirks.c > @@ -74,7 +74,8 @@ > * firmware may have already arranged and lift it with ports that already > * report their data link being up. > * > - * Return TRUE if the link has been successfully retrained, otherwise FALSE. > + * Return TRUE if the link has been successfully retrained, otherwise FALSE, > + * also when retraining was not needed in the first place. Can you recast this? I think it's slightly unclear what is returned when retraining is not needed. I *think* you return FALSE when retraining is not needed. Maybe this? Return TRUE if the link has been successfully retrained. Return FALSE if retraining was not needed or we attempted a retrain and it failed. > */ > bool pcie_failed_link_retrain(struct pci_dev *dev) > { > @@ -83,10 +84,11 @@ bool pcie_failed_link_retrain(struct pci > {} > }; > u16 lnksta, lnkctl2; > + bool ret = false; > > if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) || > !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting) > - return false; > + return ret; We know the value here, so IMO it's easier to read if we return "false" instead of "ret". > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2); > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > @@ -98,9 +100,10 @@ bool pcie_failed_link_retrain(struct pci > lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT; > pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); > > - if (pcie_retrain_link(dev, false)) { > + ret = pcie_retrain_link(dev, false) == 0; > + if (!ret) { > pci_info(dev, "retraining failed\n"); > - return false; > + return ret; Same here. We're here because !ret was true, so ret must be false. I guess in the next patch you want to return the pcie_retrain_link() return value up the chain, so it will make sense there. > } > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > @@ -117,13 +120,14 @@ bool pcie_failed_link_retrain(struct pci > lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS; > pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2); > > - if (pcie_retrain_link(dev, false)) { > + ret = pcie_retrain_link(dev, false) == 0; > + if (!ret) { > pci_info(dev, "retraining failed\n"); > - return false; > + return ret; > } > } > > - return true; > + return ret; It gets awfully subtle by the time we get here. I guess we could set a "retrain_attempted" flag above and do this: if (retrain_attempted) return true; return false; But I dunno if it's any better. I understand the need for a change like this, but the whole idea of returning failure (false) for a retrain failure and also for a "no retrain needed" is a little mind-bending. > } > > static ktime_t fixup_debug_start(struct pci_dev *dev,