On Tue, Nov 05, 2019 at 11:55:45AM +0100, Rafael J. Wysocki wrote: > On Monday, November 4, 2019 6:39:04 PM CET Bjorn Helgaas wrote: > > [+cc Andrew, Lukas] > > > > On Tue, Oct 15, 2019 at 05:44:47PM +0530, Vidya Sagar wrote: > > > On 10/15/2019 4:40 PM, Sinan Kaya wrote: > > > > ... > > > > I think the PCI core should be putting the device back D0 state as one > > > > of the first actions before enumerating. Wake up could be a combination > > > > of ACPI and/or PCI wake up depending on where your device sits in the > > > > topology. > > > > > > Yup. It is indeed doing it as part of pci_power_up() in pci.c file. > > > But, what is confusing to me is the order of the calls. > > > pci_power_up() has following calls in the same order. > > > pci_raw_set_power_state(dev, PCI_D0); > > > pci_update_current_state(dev, PCI_D0); > > > But, pci_raw_set_power_state() is accessing config space without calling > > > pci_device_is_present() whereas pci_update_current_state() which is called > > > later in the flow is calling pci_device_is_present()...! > > > > A device should always respond to config reads unless it is in D3cold > > or it is initializing after a reset. IIUC you're doing a resume, not > > a reset, so I think your device must be coming out of D3cold. That's > > typically done via ACPI, and I think we are missing some kind of delay > > before our first config access: > > > > pci_power_up > > platform_pci_set_power_state(PCI_D0) # eg, ACPI > > pci_raw_set_power_state > > pci_read_config_word(PCI_PM_CTRL) # <-- first config access > > pci_write_config_word(PCI_PM_CTRL) > > pci_read_config_word(PCI_PM_CTRL) > > pci_update_current_state > > if (... || !pci_device_is_present()) > > > > Mika is working on some delays for the transition out of D3cold [1]. > > He's more concerned with a secondary bus behind a bridge, so I don't > > think his patch addresses this case, but he's certainly familiar with > > this area. > > > > Huh, I'm really confused about this, too. I don't > > understand how resume ever works without any delay between > > platform_pci_power_manageable() and the config reads in > > pci_raw_set_power_state(). I must be missing something. > > There is a delay in the runtime_d3cold case, see > __pci_start_power_transition(). I see the delay in __pci_start_power_transition(), but I don't see how it's relevant. It's only called by pci_set_power_state(), and while many drivers call pci_set_power_state() from legacy .resume() methods, the pci_pm_resume_noirq() path where Vidya is seeing problems doesn't use it. > But overall platform_pci_power_manageable() only checks whether or > not the platform firmware can change the power state of the device. > If it can, it is expected to take care of any necessary delays while > doing that (because there may be delays required by this particular > instance of the platform firmware, beyond what is mandated by the > PCI spec, or there may not be any need to wait at all). ... That sounds like a reasonable argument for why firmware should be responsible for this delay, but I don't think that's very clear in the ACPI spec, so I wouldn't be surprised if it got missed. Based on Vidya's backtrace, I think the resume path with problems is this: pci_pm_resume_noirq pci_pm_default_resume_early pci_power_up if (platform_pci_power_manageable(dev)) platform_pci_set_power_state(dev, PCI_D0) # <-- FW delay here? pci_raw_set_power_state pci_update_current_state pci_device_is_present # <-- config read returns CRS So I think your suggestion is that Vidya's firmware should be doing the delay inside platform_pci_set_power_state()? Vidya, you typically work on Tegra, so I assume this is on an arm64 system? Does it have ACPI? Do you have access to the firmware developers to ask about who they expect to do the delays? > In any case, I'm not sure how useful it is to add delays for > everyone in the cases in which a specific system needs a delay > because of its own PM implementation limitations. It may be better > to quirk such systems explicitly as long as there are not too many > quirks in there, or we'll end up adding more and more *implicit* > quirks in the form of general delays. I agree, a general delay doesn't sound good. Are you thinking something like this? void pci_power_up(struct pci_dev *dev) { if (platform_pci_power_manageable(dev)) { platform_pci_set_power_state(dev, PCI_D0); if (dev->XXX) msleep(dev->XXX); } ... We already have dev->d3_delay and d3cold_delay, so it's getting a bit messy to keep them all straight. Bjorn