On Sat, Oct 26, 2019 at 12:28 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > pci_pm_resume_noirq > > - pci_pm_default_resume_early > > -- pci_raw_set_power_state(D0) > > > > At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit > > read) - so no wait. > > Just thinking out loud here: This is before writing PCI_PM_CTRL. The It's not - it's after writing PCI_PM_CTRL, but before reading it back. > device should be in D3hot and 0x100403 is PCI_COMMAND_IO | > PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and > PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing > PCI_COMMAND_MASTER, but maybe that got turned off during suspend). > It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has > no I/O BARs, but maybe that was set by BIOS at boot-time. I also checked PCI_COMMAND before writing PCI_PM_CTRL, it's the same value 0x403. Immediately after writing PCI_PM_CTRL, it holds the same value. 10ms later (after pci_dev_d3_sleep()), it holds the same value. Another 10ms later, it has value 0. > > pci_raw_set_power_state writes to PM_CTRL and then reads it back > > with value 0x3. > > When you write D0 to PCI_PM_CTRL the device does a soft reset, so > pci_raw_set_power_state() delays before the next access. > > When you read PCI_PM_CTRL again, I think you *should* get either > 0x0000 (indicating that the device is in D0) or 0xffff (if the read > failed with a Config Request Retry Status (CRS) because the device > wasn't ready yet). PCI_PM_CTRL stats with value 0x103. Then 0 is written and pci_dev_d3_sleep() delays 10ms. At this point it has value 0x3. After an additional 10ms delay, it has value 0. > I can't explain why you would read 0x0003 (not 0xffff) from > PCI_PM_CTRL. > > What happens if you do a dword read from PCI_VENDOR_ID here (after the > delay but before pci_dev_wait() or reading PCI_PM_CTRL)? Vendor ID remains 0x1022 at all points. > You might also try changing pci_enable_crs() to disable > PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any > difference. CRS SV has kind of a checkered history and I'm a little > dubious about whether it buys us anything. I stubbed out that register write which would have otherwise applied to 8 PCI devices (but not the XHCI controllers), it still fails in the same way unless the delay is increased. > > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > > > > At the point of return from pci_pm_resume_noirq, an extra check I > > added shows that PCI_COMMAND has value 0x403 (16-bit read). > > If PCI_COMMAND is non-zero at that point, I think something's wrong. > It should be zero by the time pci_raw_set_power_state() reads > PCI_PM_CTRL after the D3 delay. By that time, we assume the reset has > happened and the device is in D0uninitialized and fully accessible. It looks like we can detect that the reset has failed (or more precisely, not quite completed) by reading PCI_COMMAND (value not yet 0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a warning for this case). >From that angle, another workaround possibility is to catch that case and then retry the PCI_PM_CTRL write and delay once more. Daniel