Re: [PATCH v18 2/2] PCI: Add a quirk for AMD PCIe root ports w/ USB4 controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 9/13/2023 10:40, Bjorn Helgaas wrote:
On Wed, Sep 13, 2023 at 12:20:14PM +0200, Rafael J. Wysocki wrote:
On Wed, Sep 13, 2023 at 6:11 AM Mario Limonciello
<mario.limonciello@xxxxxxx> wrote:

Iain reports that USB devices can't be used to wake a Lenovo Z13
from suspend. This is because the PCIe root port has been put
into D3hot and AMD's platform can't handle USB devices waking the
platform from a hardware sleep state in this case.

It would be good to mention the PMC involvement, because it is
necessary to trigger the issue IIUC.

Apparently, if a Root Port is in D3hot at the time the PMC is called
to reduce the platform power, the PMC takes that as a license to do
something that prevents wakeup signaling from working.

This absolutely needs to be part of the commit log and the patch.

If the device advertises PME_Support for D3hot or D3cold, but we don't
actually get those PMEs after putting it in D3hot or D3cold, that's a
bug in the device.  "AMD's platform can't handle devices waking from
hardware sleep" isn't specific enough to help future PCI maintenance
because "hardware sleep state" is not a PCI concept.


This problem only occurs on Linux, when waking from a platform hardware
sleep state. Comparing the behavior on Windows and Linux, Windows
doesn't put the root ports into D3.

In Windows systems that support Modern Standby specify hardware
pre-conditions for the SoC to achieve the lowest power state by device
constraints in a SOC specific "Power Engine Plugin" (PEP) [1] [2].
They can be marked as disabled or enabled and when enabled can specify
the minimum power state required for an ACPI device.

The policy on Linux does not take constraints into account to decide what
state to put the device into at suspend like Windows does.

I'm not sure whether or not it is really clear what happens in Windows
nor whether it is relevant for this patch.

The relevant information is that Windows keeps these ports in D0 and
that demonstrably prevents the PMC from using a platform state in
which PCIe wakeup doesn't work.  Therefore Linux needs to do the same
thing, but only if system wakeup is enabled for them (or the devices

So it sounds like either of these scenarios would work:

   A) Root Port stays in D0, PMC selects platform state X, wakeups still

   B) Root Port in D3hot, PMC selects platform state Y that doesn't
      break wakeups, so wakeups still work

PCI isn't in a position to pick one over the other because it has no
idea what the tradeoffs are.


IIUC, this quirk basically forces scenario A (although a naive reading
would suggest that we could still put the Root Port in D1 or D2, since
the quirk only mentions D3).

I haven't done any testing with D1 or D2 as Linux doesn't select these states.

Rather for
devices that support D3, the target state is selected by this policy:
1. If platform_pci_power_manageable():
    Use platform_pci_choose_state()
2. If the device is armed for wakeup:
    Select the deepest D-state that supports a PME.
3. Else:
    Use D3hot.

Devices are considered power manageable by the platform when they have
one or more objects described in the table in section 7.3 of the ACPI 6.5
specification [3].

If devices are not considered power manageable; specs are ambiguous as
to what should happen.  In this situation Windows 11 leaves PCIe
ports in D0 while Linux puts them into D3 due to the policy introduced by
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").

As the Windows PEP driver uses constraints to express the desired state
that should be selected for suspend  but Linux doesn't introduce a quirk
for the problematic root ports.

I would say "but Linux doesn't do that, so ...", because it currently
reads like the quirk was not present which is slightly confusing.

When selecting a target state specifically for sleep in
`pci_prepare_to_sleep` this quirk will prevent the root ports from
selecting D3hot or D3cold if they have been configured as a wakeup source.

Cc: stable@xxxxxxxxxxxxxxx
Link: [1]
Link: [2]
Link: [3]
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
The same PCI ID is used for multiple different root ports.  This problem
only affects the root port that the USB4 controller is connected to.

If true, this seems important, not something to discard because it's
after "---".


  drivers/pci/pci.c    |  5 +++++
  drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
  include/linux/pci.h  |  2 ++
  3 files changed, 35 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 59c01d68c6d5..a113b8941d09 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
         if (target_state == PCI_POWER_ERROR)
                 return -EIO;

+       /* quirk to avoid setting D3 */
+       if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&

Why did you pick dev_flags?  If there's not a reason to prefer that,
I'd just add a 1-bit bitfield because that doesn't require a new

There was no strong reason for it. A 1-bit bitfield struct pci_dev will actually make it easier for this quirk to live in a more proper home for the situation (drivers/platform/x86/amd/pmc/pmc.c).

+          (target_state == PCI_D3hot || target_state == PCI_D3cold))
+               target_state = PCI_D0;
         pci_enable_wake(dev, target_state, wakeup);

         error = pci_set_power_state(dev, target_state);

+ * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
+ * states may cause problems when the system attempts wake up from s2idle.
+ *
+ * This manifests as a missing wakeup interrupt on the following systems:
+ * Family 19h model 44h (PCI 0x14b9)
+ * Family 19h model 74h (PCI 0x14eb)
+ * Family 19h model 78h (PCI 0x14eb)
+ *
+ * Add a quirk to the root port if a USB4 controller is connected to it
+ * to avoid D3 power states.

I want to know whether this is D3hot, D3cold, or both.  Also in the
pci_info() below.

Linux doesn't select D3cold for this root port, but it should affect both.

Also, do we have some indication that this is specific to Ryzen?  If
not, I assume this is an ongoing issue, and matching on Device IDs
just means we'll have to debug the same problem again and add more

This is why my earlier attempts (v16 and v17) tried to tie it to constraints. These are what the uPEP driver in Windows uses to make the decision of what power state to put integrated devices like the root port into.

In Windows if no uPEP driver is installed "Windows internal policy" dictates what happens. If the uPEP driver is installed then it influences the policy based upon the constraints.

Rafael had feedback against constraints in v17, which is why I'm back to a quirk for v18.

This issue as I've described it is specific to AMD Ryzen.
I expect it to be an ongoing issue. I also expect unless we use constraints or convince the firmware team to add a _S0W object with a value of "0" for the sake of Linux that we will be adding IDs every year to wherever this lands as we reproduce it on newer SoCs.

+static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
+       struct pci_dev *child = NULL;
+       while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
+               if (pci_upstream_bridge(child) != pdev)
+                       continue;
+               pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
+               pci_info(pdev, "quirk: disabling D3 for wakeup\n");

I don't remember seeing any evidence that this is a USB4-specific
issue.  My guess is that it affects wakeups from *any* device below
these Root Ports, since I assume the PMEs are bog standard PCIe
events, not anything special about USB4.

The hardware team describes the issue to me as specific to how the internal interrupt routing works with the USB4 controller connected to this root port.

It sounds like this is only an issue when amd_pmc_s2idle_prepare() is
involved, right?  The PMEs and wakeups work as expected until we tell
the PMC to do its magic thing?

If so, shouldn't this be conditional on something in amd/pmc.c to
connect these pieces together?  Looks like amd/pmc.c only works if
the platform provides an AMDI0005, AMDI0006, etc., ACPI device?

I think it'd be nice if amd_pmc_probe() logged a hint about it being

I personally really thought the constraints approach from v16 and v17 did this well and would have scaled effectively.

As Rafael has opposition to it what I'm thinking from everyone's feedback today is to add code into amd_pmc_probe() that twiddles a new bit for the matching device in `struct pci_dev`, maybe called `no_d3_for_wakeup`.

Then as we add PMC support for new devices, we can add a new line to a switch/case to set that bit if necessary for the platform.

AFAICS it only logs something on errors.  This has been
incredibly painful to debug.  It looks like the PMCs do very subtle
power management things, and it'd be nice to have a hint that there's
really fancy stuff going on in the background.

Sure I'll add a dev_info or pci_info when it's set.

[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux