On Fri, Nov 22, 2019 at 2:15 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > I definitely was not understanding this correctly. There is no path > for a D3cold -> D3hot transition. Per spec (PCIe r5.0, sec 5.8), the > only legal exit from D3cold is to D0uninitialized. I'm also learning these details as we go. During runtime suspend, the ACPI _PS3 method (which does exist on this device) is called, then _PR3 resources are turned off, which (I think) means that the state should now be D3cold. During runtime resume, the ACPI _PR0 resources are turned on, then ACPI _PS0 method is called (and does exist on this device), and my reading is that this should put the device in D0. But then when pci_update_current_state() is called, it reads pmcsr as 3 (D3hot). That's not what I would expect. I guess this means that this platform's _PR3/_PS3 do not actually allow us to put the device into D3cold, and/or the _PR0/_PS0 transition does not actually transition the device to D0. While there is some ACPI strangeness here, the D3hot vs D3cold thing is perhaps not the most relevant point. If I hack the code to avoid D3cold altogether, just trying to do D0->D3hot->D0, it fails in the same way. > I know you tried a debug patch to call pci_dev_wait(), and it didn't > work, but I'm not sure exactly where it was called. I have these > patches on my pci/pm branch for v5.5: > > bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier") > 395f121e6199 ("PCI/PM: Wait for device to become ready after power-on") > > The latter adds the wait just before we call > pci_raw_set_power_state(). If the device is responding with CRS > status, that should be the point where we'd see it. If you have a > chance to try it, I'd be interested in the results. pci_dev_wait() doesn't have any effect no matter where you put it because we have yet to observe this device presenting a CRS-like condition. According to our earlier experiments, PCI_VENDOR_ID and PCI_COMMAND never return the ~0 value that would be needed for pci_dev_wait() to have any effect. I tried the branch anyway and it doesn't solve the issue. I haven't finished gathering all the logs you asked for, but I tried to summarize my current understanding at https://bugzilla.kernel.org/show_bug.cgi?id=205587 - hopefully that helps. Thanks Daniel