Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 29 Jan 2024, Bjorn Helgaas wrote:

> On Mon, Jan 29, 2024 at 01:27:09PM +0200, Ilpo Järvinen wrote:
> > While a device is runtime suspended along with its PCIe hierarchy, the
> > device could get disconnected. Because of the suspend, the device
> > disconnection cannot be detected until portdrv/hotplug have resumed. On
> > runtime resume, pcie_wait_for_link_delay() is called:
> > 
> >   pci_pm_runtime_resume()
> >     pci_pm_bridge_power_up_actions()
> >       pci_bridge_wait_for_secondary_bus()
> >         pcie_wait_for_link_delay()
> > 
> > Because the device is already disconnected, this results in cascading
> > failures:
> > 
> >   1. pcie_wait_for_link_status() returns -ETIMEDOUT.
> > 
> >   2. After the commit a89c82249c37 ("PCI: Work around PCIe link
> >      training failures"),
> 
> I this this also depends on the merge resolution in 1abb47390350
> ("Merge branch 'pci/enumeration'").  Just looking at a89c82249c37 in
> isolation suggests that pcie_wait_for_link_status() returning
> -ETIMEDOUT would not cause pcie_wait_for_link_delay() to call
> pcie_failed_link_retrain().

I was aware of the merge but I seem to have somehow misanalyzed the return 
values earlier since I cannot anymore reach my earlier conclusion and now
ended up agreeing with your analysis that 1abb47390350 broke it.

That would imply there is a logic error in 1abb47390350 in addition to 
the LBMS-logic problem in a89c82249c37 my patch is fixing... However, I 
cannot pinpoint a single error because there seems to be more than one in 
the whole code.

First of all, this is not true for pcie_failed_link_retrain():
 * Return TRUE if the link has been successfully retrained, otherwise FALSE.
If LBMS is not set, the Target Speed quirk is not applied but the function 
still returns true. I think that should be changed to early return false
when no LBMS is present.

But if I make that change, then pcie_wait_for_link_delay() will do 
msleep() + return true, and pci_bridge_wait_for_secondary_bus() will call 
long ~60s pci_dev_wait().

I'll try to come up another patch to cleanup all that return logic so that 
it actually starts to make some sense.

> >      pcie_failed_link_retrain() spuriously detects
> >      this failure as a Link Retraining failure and attempts the Target
> >      Speed trick, which also fails.
> 
> Based on the comment below, I guess "Target Speed trick" probably
> refers to the "retrain at 2.5GT/s, then remove the speed restriction
> and retrain again" part of pcie_failed_link_retrain() (which I guess
> is basically the entire point of the function)?

Yes. I'll change the wording slightly to make it more obvious and put 
(Target Speed quirk) into parenthesis so I can use it below.

> >   3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
> >      cannot succeed (but waits ~1 minute, delaying the resume).
> > 
> > The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> > sec 7.5.3.8) is set. For links that have been operational before
> > suspend, it is well possible that LBMS has been set at the bridge and
> > remains on. Thus, after resume, LBMS does not indicate the link needs
> > the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> > issue.


-- 
 i.

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux