[+cc Rafael for ACPI power transition from D3cold to D0 delay] On Mon, Aug 26, 2024 at 02:16:34PM -0500, Mario Limonciello wrote: > On 8/23/2024 14:54, Bjorn Helgaas wrote: > > On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote: > > > If a dock is plugged in at the same time as autosuspend delay > > > then this can cause malfunctions in the USB4 stack. This happens > > > because the device is still in D3cold at the time that the PCI > > > core handed control back to the USB4 stack. > ... > > > A device that has gone through a reset may return a value in > > > PCI_COMMAND but that doesn't mean it's finished transitioning to > > > D0. For devices that support power management explicitly check > > > PCI_PM_CTRL on everything but system resume to ensure the > > > transition happened. > ... > > > Devices that don't support power management and system resume > > > will continue to use PCI_COMMAND. > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int > > > * the read (except when CRS SV is enabled and the read was for the > > > * Vendor ID; in that case it synthesizes 0x0001 data). > > > * > > > - * Wait for the device to return a non-CRS completion. Read the > > > - * Command register instead of Vendor ID so we don't have to > > > - * contend with the CRS SV value. > > > + * Wait for the device to return a non-CRS completion. On devices > > > + * that support PM control and on waits that aren't part of system > > > + * resume read the PM control register to ensure the device has > > > + * transitioned to D0. On devices that don't support PM control, > > > + * or during system resume read the command register to instead of > > > + * Vendor ID so we don't have to contend with the CRS SV value. > > > */ > > > for (;;) { > > > - u32 id; > > > - > > > if (pci_dev_is_disconnected(dev)) { > > > pci_dbg(dev, "disconnected; not waiting\n"); > > > return -ENOTTY; > > > } > > > - pci_read_config_dword(dev, PCI_COMMAND, &id); > > > - if (!PCI_POSSIBLE_ERROR(id)) > > > - break; > > > + if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) { > > > + u16 pmcsr; > > > + > > > + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > > > + if (!PCI_POSSIBLE_ERROR(pmcsr) && > > > + (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0) > > > + break; > > > + } else { > > > + u32 id; > > > + > > > + pci_read_config_dword(dev, PCI_COMMAND, &id); > > > + if (!PCI_POSSIBLE_ERROR(id)) > > > + break; > > > + } > > > > What is the rationale behind using PCI_PM_CTRL in some cases and > > PCI_COMMAND in others? > > We saw a deadlock during resume from suspend when PCI_PM_CTRL was used for > all cases that supported dev->pm_cap. > > > Is there some spec language we can cite for > > this? > > Perhaps it being a "cold reset" during resume? > > > IIUC, pci_dev_wait() waits for a device to be ready after a reset > > (FLR, D3hot->D0 transition for devices where No_Soft_Reset is clear, > > DPC) and after power-up from D3cold (pci_pm_bridge_power_up_actions()). > > I think device readiness in all of these cases is covered by PCIe > > r6.0, sec 6.6.1. > ... > > If the Root Port above the device supports Configuration RRS > > Software Visibility, I think we probably should use that here > > instead of either PCI_COMMAND or PCI_PM_CTRL. > > I did check and in this case the root port the USB4 routers are > connected to support this. > > How do you think this should be done instead? > > > SR-IOV VFs don't implement Vendor ID, which might complicate this > > a little. But it niggles in my mind that there may be some other > > problem beyond that. Maybe Alex remembers. > > > > Anyway, if we meet the requirements of sec 6.6.1, the device > > should be ready to respond to config requests, and I assume that > > also means the device is in D0. > > Within that section there is a quote to point out: > > " > To allow components to perform internal initialization, system > software must wait a specified minimum period following exit from a > Conventional Reset of one or more devices before it is permitted to > issue Configuration Requests to those devices > " > > In pci_power_up() I don't really see any hardcoded delays > specifically for this case of leaving D3cold. The PCI PM spec > specifies that it will take "Full context restore or boot latency". > I don't think it's reasonable to have NO delay. I agree, leaving D3cold means "applying power to the device", and PCIe r6.0, sec 6.6.1, says that's a Fundamental Reset: A Fundamental Reset must occur following the application of power to the component. This is called a Cold Reset. So we need a delay similar to what we do in pci_bridge_wait_for_secondary_bus(). I don't know whether that is supposed to happen somewhere on the ACPI side or in the PCI side, but my inclination would be the PCI side because the delay isn't really platform-specific, it's specified by the PCI specs, and the OS needs to manage the RRS Software Visibility retries and features like Readiness Notifications. pci_set_low_power_state() contains corresponding delays for putting devices in D1, D2, and D3hot. Maybe the D3cold -> D0 delays should be in platform_pci_set_power_state()? I think pci_power_up() is the only caller that sets the state to D0, and it assumes the device is Configuration Ready when platform_pci_set_power_state() returns: pci_power_up platform_pci_set_power_state pci_read_config_word(PCI_PM_CTRL) So I think we need basically the same delays and pci_dev_wait() stuff in this path. Bjorn