On Fri, Nov 22, 2019 at 4:00 AM Daniel Drake <drake@xxxxxxxxxxxx> wrote: > > On Fri, Nov 22, 2019 at 2:15 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > I definitely was not understanding this correctly. There is no path > > for a D3cold -> D3hot transition. Per spec (PCIe r5.0, sec 5.8), the > > only legal exit from D3cold is to D0uninitialized. > > I'm also learning these details as we go. > > During runtime suspend, the ACPI _PS3 method (which does exist on this > device) is called, then _PR3 resources are turned off, which (I think) > means that the state should now be D3cold. Correct. > During runtime resume, the ACPI _PR0 resources are turned on, then > ACPI _PS0 method is called (and does exist on this device), and my > reading is that this should put the device in D0. That should be something like D0uninitialized. > But then when pci_update_current_state() is called, it reads pmcsr as > 3 (D3hot). That's not what I would expect. I guess this means that > this platform's _PR3/_PS3 do not actually allow us to put the device > into D3cold, That you can't really say. Anyway, it is not guaranteed to do that. For example, the power resource(s) listed by _PR3 for the device may be referenced by something else too which prevents them from being turned off. > and/or the _PR0/_PS0 transition does not actually transition the device to D0. Yes. Which may be the case if the power resource(s) in _PR3 have not been turned off really. [To debug this a bit more, you can enable dynamic debug in drivers/acpi/device_pm.c.] > While there is some ACPI strangeness here, the D3hot vs D3cold thing > is perhaps not the most relevant point. If I hack the code to avoid > D3cold altogether, just trying to do D0->D3hot->D0, it fails in the > same way. OK, but then you don't really flip the power resource(s), so that only means that _PS0 does not restore D0, but in general it only is valid to execute _PS0 after _PS3 (if both are present which is the case here), so this is not conclusive again. > > I know you tried a debug patch to call pci_dev_wait(), and it didn't > > work, but I'm not sure exactly where it was called. I have these > > patches on my pci/pm branch for v5.5: > > > > bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier") > > 395f121e6199 ("PCI/PM: Wait for device to become ready after power-on") > > > > The latter adds the wait just before we call > > pci_raw_set_power_state(). If the device is responding with CRS > > status, that should be the point where we'd see it. If you have a > > chance to try it, I'd be interested in the results. > > pci_dev_wait() doesn't have any effect no matter where you put it > because we have yet to observe this device presenting a CRS-like > condition. According to our earlier experiments, PCI_VENDOR_ID and > PCI_COMMAND never return the ~0 value that would be needed for > pci_dev_wait() to have any effect. > > I tried the branch anyway and it doesn't solve the issue. > > I haven't finished gathering all the logs you asked for, but I tried > to summarize my current understanding at > https://bugzilla.kernel.org/show_bug.cgi?id=205587 - hopefully that > helps. OK, thanks for that!