On Wed, 24 Apr 2024, Smita Koralahalli wrote: > Clear Link Bandwidth Management Status (LBMS) if set, on a hot-remove event. > > The hot-remove event could result in target link speed reduction if LBMS > is set, due to a delay in Presence Detect State Change (PDSC) happening > after a Data Link Layer State Change event (DLLSC). > > In reality, PDSC and DLLSC events rarely come in simultaneously. Delay in > PDSC can sometimes be too late and the slot could have already been > powered down just by a DLLSC event. And the delayed PDSC could falsely be > interpreted as an interrupt raised to turn the slot on. This false process > of powering the slot on, without a link forces the kernel to retrain the > link if LBMS is set, to a lower speed to restablish the link thereby > bringing down the link speeds [2]. > > According to PCIe r6.2 sec 7.5.3.8 [1], it is derived that, LBMS cannot > be set for an unconnected link and if set, it serves the purpose of > indicating that there is actually a device down an inactive link. > However, hardware could have already set LBMS when the device was > connected to the port i.e when the state was DL_Up or DL_Active. Some > hardwares would have even attempted retrain going into recovery mode, > just before transitioning to DL_Down. > > Thus the set LBMS is never cleared and might force software to cause link > speed drops when there is no link [2]. > > Dmesg before: > pcieport 0000:20:01.1: pciehp: Slot(59): Link Down > pcieport 0000:20:01.1: pciehp: Slot(59): Card present > pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > pcieport 0000:20:01.1: retraining failed > pcieport 0000:20:01.1: pciehp: Slot(59): No link > > Dmesg after: > pcieport 0000:20:01.1: pciehp: Slot(59): Link Down > pcieport 0000:20:01.1: pciehp: Slot(59): Card present > pcieport 0000:20:01.1: pciehp: Slot(59): No link > > [1] PCI Express Base Specification Revision 6.2, Jan 25 2024. > https://members.pcisig.com/wg/PCI-SIG/document/20590 > [2] Commit a89c82249c37 ("PCI: Work around PCIe link training failures") > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> > --- > 1. Should be based on top of fixes for link retrain status in > pcie_wait_for_link_delay() > https://patchwork.kernel.org/project/linux-pci/list/?series=824858 > https://lore.kernel.org/linux-pci/53b2239b-4a23-a948-a422-4005cbf76148@xxxxxxxxxxxxxxx/ > > Without the fixes patch output would be: > pcieport 0000:20:01.1: pciehp: Slot(59): Link Down > pcieport 0000:20:01.1: pciehp: Slot(59): Card present > pcieport 0000:20:01.1: broken device, retraining non-functional downstream link at 2.5GT/s > pcieport 0000:20:01.1: retraining failed > pcieport 0000:20:01.1: pciehp: Slot(59): No device found. Did you hit the 60 sec delay issue without series 824858? If you've tested them and the fixes helped your case, could you perhaps give Tested-by for that series too (in the relevant thread)? > 2. I initially attempted to wait for both events PDSC and DLLSC to happen > and then turn on the slot. > Similar to: https://lore.kernel.org/lkml/20190205210701.25387-1-mr.nuke.me@xxxxxxxxx/ > but before turning on the slot. > > Something like: > - ctrl->state = POWERON_STATE; > - mutex_unlock(&ctrl->state_lock); > - if (present) > + if (present && link_active) { > + ctrl->state = POWERON_STATE; > + mutex_unlock(&ctrl->state_lock); > ctrl_info(ctrl, "Slot(%s): Card present\n", > slot_name(ctrl)); > - if (link_active) > ctrl_info(ctrl, "Slot(%s): Link Up\n", > slot_name(ctrl)); > - ctrl->request_result = pciehp_enable_slot(ctrl); > - break; > + ctrl->request_result = pciehp_enable_slot(ctrl); > + break; > + } > + else { > + mutex_unlock(&ctrl->state_lock); > + break; > + } > > This would also avoid printing the lines below on a remove event. > pcieport 0000:20:01.1: pciehp: Slot(59): Card present > pcieport 0000:20:01.1: pciehp: Slot(59): No link > > I understand this would likely be not applicable in places where broken > devices hardwire PDS to zero and PDSC would never happen. But I'm open to > making changes if this is more applicable. Because, SW cannot directly > track the interference of HW in attempting link retrain and setting LBMS. > > 3. I tried introducing delay similar to pcie_wait_for_presence() but I > was not successful in picking the right numbers. Hence hit with the same > link speed drop. > > 4. For some reason I was unable to clear LBMS with: > pcie_capability_clear_word(ctrl->pcie->port, PCI_EXP_LNKSTA, > PCI_EXP_LNKSTA_LBMS); LBMS is write-1-to-clear, pcie_capability_clear_word() tries to write 0 there (the accessor doesn't do what you seem to expect, it clears normal bits, not write-1-to-clear bits). > --- > drivers/pci/hotplug/pciehp_pci.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c > index ad12515a4a12..9155fdfd1d37 100644 > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -92,7 +92,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) > { > struct pci_dev *dev, *temp; > struct pci_bus *parent = ctrl->pcie->port->subordinate; > - u16 command; > + u16 command, lnksta; > > ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n", > __func__, pci_domain_nr(parent), parent->number); > @@ -134,4 +134,10 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) > } > > pci_unlock_rescan_remove(); > + > + /* Clear LBMS on removal */ > + pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA, &lnksta); > + if (lnksta & PCI_EXP_LNKSTA_LBMS) > + pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA, > + PCI_EXP_LNKSTA_LBMS); It's enough to unconditionally write PCI_EXP_LNKSTA_LBMS, no need to check first. The comment is just spelling out what can already be read from the code so I'd drop the comment. I agree it makes sense to clear the LBMS when device is removed. -- i.