On Tue, Jun 16, 2020 at 04:17:11PM -0500, Bjorn Helgaas wrote: > On Mon, Jun 15, 2020 at 09:24:13PM +0300, Georgi Djakov wrote: > > The uPD720201 USB3 host controller (connected to PCIe) on the Dragonboard > > 845c is often failing during suspend and resume. The following messages > > are seen over the console: > > > > PM: suspend entry (s2idle) > > Filesystems sync: 0.000 seconds > > Freezing user space processes ... (elapsed 0.001 seconds) done. > > OOM killer disabled. > > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > printk: Suspending console(s) (use no_console_suspend to debug) > > dwc3-qcom a8f8800.usb: HS-PHY not in L2 > > dwc3-qcom a6f8800.usb: HS-PHY not in L2 > > xhci_hcd 0000:01:00.0: can't change power state from D3hot to D0 (config > > space inaccessible) > > xhci_hcd 0000:01:00.0: can't change power state from D3hot to D0 (config > > space inaccessible) > > xhci_hcd 0000:01:00.0: Controller not ready at resume -19 > > xhci_hcd 0000:01:00.0: PCI post-resume error -19! > > xhci_hcd 0000:01:00.0: HC died; cleaning up > > > > Then the USB devices are not functional anymore. Let's disable the PM of > > the controller for now, as this will at least keep USB devices working > > even after suspend and resume. Georgi, can you collect the complete dmesg log and "sudo lspci -vvxxxx" output somewhere? A new report at bugzilla.kernel.org would be a good spot. Maybe we're missing a delay here. The "config space inaccessible" message means we read 0xffff from PCI_PM_CTRL, which probably means the device is still in D3cold. If it were in any other power state, PCI_PM_CTRL should be readable, and 0xffff is not a valid value. Could you also insert a dump_stack() right after we print that "config space inaccessible" message? I don't know enough about power management to understand why we see that message twice. > This seems like we're just covering up a deeper problem here. I think > it would be better to fix the underlying problem. > > The quirk you're adding is specific to the Renesas 0x0014 device. Is > there some reason to think the problem is specific to that device, or > might other devices have the same problem? > > Maybe we're missing something in pcie-qcom.c? Is there any > suspend/resume support required in that driver? It doesn't look like > it has anything except that it calls pm_runtime_enable(). > > > Signed-off-by: Georgi Djakov <georgi.djakov@xxxxxxxxxx> > > --- > > drivers/pci/controller/dwc/pcie-qcom.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c > > index 138e1a2d21cc..c1f502682a19 100644 > > --- a/drivers/pci/controller/dwc/pcie-qcom.c > > +++ b/drivers/pci/controller/dwc/pcie-qcom.c > > @@ -1439,6 +1439,13 @@ static void qcom_fixup_class(struct pci_dev *dev) > > { > > dev->class = PCI_CLASS_BRIDGE_PCI << 8; > > } > > + > > +static void qcom_fixup_nopm(struct pci_dev *dev) > > +{ > > + dev->pm_cap = 0; > > + dev_info(&dev->dev, "Disabling PCI power management\n"); > > +} > > + > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x0101, qcom_fixup_class); > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x0104, qcom_fixup_class); > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x0106, qcom_fixup_class); > > @@ -1446,6 +1453,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x0107, qcom_fixup_class); > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x0302, qcom_fixup_class); > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x1000, qcom_fixup_class); > > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_QCOM, 0x1001, qcom_fixup_class); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_RENESAS, 0x0014, qcom_fixup_nopm); > > The convention is that DECLARE_PCI_FIXUP_*() comes immediately after > the quirk function itself, so the whole patch would be a single diff > hunk. See drivers/pci/quirks.c for many examples. > > > static struct platform_driver qcom_pcie_driver = { > > .probe = qcom_pcie_probe,