On 07/25/2018 11:08 PM, Marek Vasut wrote: > On 06/13/2018 07:25 PM, Bjorn Helgaas wrote: >> On Wed, Jun 13, 2018 at 04:52:52PM +0100, Lorenzo Pieralisi wrote: >>> On Wed, Jun 13, 2018 at 08:53:08AM -0500, Bjorn Helgaas wrote: >>>> On Wed, Jun 13, 2018 at 01:54:51AM +0200, Marek Vasut wrote: >>>>> On 06/11/2018 03:59 PM, Bjorn Helgaas wrote: >>>>>> On Sun, Jun 10, 2018 at 03:57:10PM +0200, Marek Vasut wrote: >>>>>>> On 11/17/2017 06:49 PM, Lorenzo Pieralisi wrote: >>>>>>>> On Fri, Nov 10, 2017 at 10:58:42PM +0100, Marek Vasut wrote: >>>>>>>>> From: Phil Edworthy <phil.edworthy@xxxxxxxxxxx> >>>>>>>>> >>>>>>>>> Most PCIe host controllers support L0s and L1 power states via ASPM. >>>>>>>>> The R-Car hardware only supports L0s, so when the system suspends and >>>>>>>>> resumes we have to manually handle L1. >>>>>>>>> When the system suspends, cards can put themselves into L1 and send a >>>>>>>> >>>>>>>> I assumed L1 entry has to be negotiated depending upon the PCIe >>>>>>>> hierarchy capabilities, I would appreciate if you can explain to >>>>>>>> me what's the root cause of the issue please. >>>>>>> >>>>>>> You should probably ignore the suspend/resume part altogether. The issue >>>>>>> here is that the cards can enter L1 state, while the controller won't do >>>>>>> that automatically, it can only detect that the link went into L1 state. >>>>>>> If that happens,the driver must manually put the controller to L1 state. >>>>>>> The controller can transition out of L1 state automatically though. >>>>>> >>>>>> From earlier discussion I thought the R-Car root port did not >>>>>> advertise L1 support. >>>>> >>>>> Which discussion ? This one or somewhere else ? >>>> >>>> https://lkml.kernel.org/r/HK2PR0601MB1393D917D343E6363484CA68F5CB0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx >>>> >>>> Re-reading that, I think I see my misunderstanding. I was only >>>> considering L1 in the ASPM context. I didn't realize the L1 >>>> implications of devices being in states other than D0. >>>> >>>> Obviously L1 support for ASPM is optional and advertised via Link >>>> Capabilities. But per PCIe r4.0, sec 5.2, L1 support is required for >>>> PCI-PM compatible power management, and is entered "whenever all >>>> Functions ... are programmed to a D-state other than D0." >>>> >>>> So I guess this means *every* device is supposed to support L1 when it >>>> is in a non-D0 power state. I think *this* is the case you're >>>> solving. >>>> >>>> A little more of this detail, e.g., that this issue has nothing to do >>>> with ASPM, it's probably an R-Car erratum that the RC can't transition >>>> from L1 to L0, etc., in the changelog would really help clear things >>>> up for me. >>> >>> I think that the issue is related to the L0->L1 transition upon system >>> suspend (ie the kernel must force the controller into L1 when all >>> devices are in a sleep state) and for this specific reason I still think >>> that checking for a PM_Enter_L1 DLLP reception and doing the L0->L1 >>> transition within a config access is wrong and prone to error (what's >>> the rationale behind that ?), this ought to be done using PM methods in >>> the host controller driver. >> >> But doesn't the problem happen whenever the link goes to L1, for any >> reason? E.g., runtime power management might put an endpoint in D3 >> even if we're not doing a whole system suspend. A user could even >> force the endpoint to D3 by writing to PCI_PM_CTRL with "setpci". If >> that's the case, I don't think the host controller PM methods will be >> enough to work around the issue. > > I think so, it's the link that goes into L1 state and this can happen > without any action from the controller side. > >> The comment in the patch ("If we are not in L1 link state and we have >> received PM_ENTER_L1 DLLP, transition to L1 link state") suggests that >> the R-Car host doesn't handle step 10 in PCIe r4.0, sec 5.3.2.1 >> correctly, i.e., it doesn't complete the transition of the link to L1. >> >> Putting this workaround in the config accessor makes sense to me >> because in this situation the endpoint thinks it's in L1 and it won't >> receive TLPs for config accesses. Apparently forcing the RP to L1 >> completes the L1 entry, and the RP correctly handles the "Exit from L1 >> State" (sec 5.3.2.2) that's required when the RP needs to send a TLP >> to the endpoint. >> >> I think there's still a potential issue if the endpoint goes to a >> non-D0 state, the link is stuck in this transitional state (endpoint >> thinks it's L1, RP thinks it's L0), and the *endpoint* wants to exit >> L1, e.g., so it can send a PME message for a wakeup. I don't know >> what happens then. > > Is there some hardware which I can use to simulate this situation ? > >> If there were a real erratum writeup for this, it would probably >> discuss this situation. > > I went through the latest errata sheet and don't see anything. The > datasheet only mentions that L0/L0s/L1 is supported and L2 is not supported. > > Maybe Phil can comment on this too ? Bump ? > [...] > -- Best regards, Marek Vasut