On Wed, Apr 24, 2024 at 03:33:39AM +0000, Smita Koralahalli wrote: > Clear Link Bandwidth Management Status (LBMS) if set, on a hot-remove event. > > The hot-remove event could result in target link speed reduction if LBMS > is set, due to a delay in Presence Detect State Change (PDSC) happening > after a Data Link Layer State Change event (DLLSC). > > In reality, PDSC and DLLSC events rarely come in simultaneously. Delay in > PDSC can sometimes be too late and the slot could have already been > powered down just by a DLLSC event. And the delayed PDSC could falsely be > interpreted as an interrupt raised to turn the slot on. This false process > of powering the slot on, without a link forces the kernel to retrain the > link if LBMS is set, to a lower speed to restablish the link thereby > bringing down the link speeds [2]. > > According to PCIe r6.2 sec 7.5.3.8 [1], it is derived that, LBMS cannot > be set for an unconnected link and if set, it serves the purpose of > indicating that there is actually a device down an inactive link. > However, hardware could have already set LBMS when the device was > connected to the port i.e when the state was DL_Up or DL_Active. Some > hardwares would have even attempted retrain going into recovery mode, > just before transitioning to DL_Down. > > Thus the set LBMS is never cleared and might force software to cause link > speed drops when there is no link [2]. Is this an issue introduced by commit a89c82249c37 ("PCI: Work around PCIe link training failures")? If so, could you add a Fixes tag? I might be mistaken but my impression is that we're seeing a lot of fallout as a result of that commit. Thanks, Lukas