On 12/11/2024 06:47, Werner Sembach wrote:
Hi,
Am 10.12.24 um 17:00 schrieb Mario Limonciello:
On 12/10/2024 09:24, Werner Sembach wrote:
Hi,
Am 09.12.24 um 20:45 schrieb Mario Limonciello:
On 12/9/2024 13:36, Werner Sembach wrote:
From: Mario Limonciello <mario.limonciello@xxxxxxx>
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
sets the policy that all PCIe ports are allowed to use D3. When
the system is suspended if the port is not power manageable by the
platform and won't be used for wakeup via a PME this sets up the
policy for these ports to go into D3hot.
This policy generally makes sense from an OSPM perspective but it
leads
to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1
with
an unupdated BIOS.
- On family 19h model 44h (PCI 0x14b9) this manifests as a missing
wakeup
interrupt.
- On family 19h model 74h (PCI 0x14eb) this manifests as a system
hang.
On the affected Device + BIOS combination, add a quirk for the PCI
device
ID used by the problematic root port on both chips to ensure that
these
root ports are not put into D3hot at suspend.
This patch is based on
https://lore.kernel.org/linux-pci/20230708214457.1229-2-
mario.limonciello@xxxxxxx/
but with the added condition both in the documentation and in the
code to
apply only to the TUXEDO Sirius 16 Gen 1 with the original
unpatched BIOS.
Co-developed-by: Georg Gottleuber <ggo@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Georg Gottleuber <ggo@xxxxxxxxxxxxxxxxxxx>
Co-developed-by: Werner Sembach <wse@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Werner Sembach <wse@xxxxxxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # 6.1+
Reported-by: Iain Lane <iain@xxxxxxxxxxxxxxxxxxx>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-
suspend-with-external-USB-keyboard/m-p/5217121
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 76f4df75b08a1..2226dca56197d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3908,6 +3908,37 @@ static void
quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
quirk_apple_poweroff_thunderbolt);
+
+/*
+ * Putting PCIe root ports on Ryzen SoCs with USB4 controllers
into D3hot
+ * may cause problems when the system attempts wake up from s2idle.
+ *
+ * On family 19h model 44h (PCI 0x14b9) this manifests as a
missing wakeup
+ * interrupt.
+ * On family 19h model 74h (PCI 0x14eb) this manifests as a system
hang.
+ *
+ * This fix is still required on the TUXEDO Sirius 16 Gen1 with
the original
+ * unupdated BIOS.
+ */
+static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
+ {
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
+ DMI_MATCH(DMI_BOARD_NAME, "APX958"),
+ DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
+ },
+ },
+ {}
+};
+
+static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
+{
+ if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
+ !acpi_pci_power_manageable(pdev))
+ pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9,
quirk_ryzen_rp_d3);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb,
quirk_ryzen_rp_d3);
#endif
/*
Wait, what is wrong with:
commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD
Rembrandt and Phoenix USB4")
Is that not activating on your system for some reason?
Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had
blackscreen on wakeup.
OK, I think we need to dig a layer deeper to see which root port is
causing problems to understand this.
With a newer BIOS for that device suspend and resume however works.
Is there any reason that people would realistically be staying on the
old BIOS and instead we need to carry this quirk in the kernel for
eternity?
Fear of device bricking or not knowing an update is available.
The not knowing is solved by publishing firmware to LVFS.
Most "popular" distributions include fwupd, regularly check for updates
and will notify users through the MOTD or a graphical application that
there is something available.
With the Linux ecosystem for BIOS updates through fwupd + LVFS it's
not a very big barrier to entry to do an update like it was 20 years ago.
Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh
scripts which still is the main form ODMs provide bios updates.
It's tangential to this thread; but generally ODMs will provide you a
capsule if you ask for one.
Anyhow if you are providing scripts and random EFI binaries in order to
get things updated, that explains why this is a large enough chunk of
people to justify a patch like this.
Looking in the patch the device id's are different (0x162e, 0x162f,
0x1668, and 0x1669).
TUXEDO Sirius 16 Gen1 is Phoenix based, right? So at a minimum you
shouldn't be including PCI IDs from Rembrandt (0x14b9)
Thanks for the hint, I can delete that then.
Here is the topology from a Phoenix system on my side:
https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
Sorry for the noob question: How do I get that format? it's not lspci -
tvnn on my system
No worry.
Having looked at a lot of s2idle bugs I've found it's generally helpful
to know what ACPI companion is assigned to devices and what are parents.
So it's actually created by this function in the s2idle issue triage
script, amd_s2idle.py.
https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1945
And while on the topic, does your "broken" BIOS report this from that
script?
'''
Platform may hang resuming. Upgrade your firmware or add
pcie_port_pm=off to kernel command line if you have problems
'''
But i think this contains the same information:
$ lspci -Pnn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Root Complex [1022:14e8]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU
[1022:14e9]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Dummy Host Bridge [1022:14ea]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
GPP Bridge [1022:14ed]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Dummy Host Bridge [1022:14ea]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
GPP Bridge [1022:14ee]
00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
GPP Bridge [1022:14ee]
00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
GPP Bridge [1022:14ee]
00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
GPP Bridge [1022:14ee]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Dummy Host Bridge [1022:14ea]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
USB4/Thunderbolt PCIe tunnel [1022:14ef]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Dummy Host Bridge [1022:14ea]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Dummy Host Bridge [1022:14ea]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
Internal GPP Bridge to Bus [C:A] [1022:14eb]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
Internal GPP Bridge to Bus [C:A] [1022:14eb]
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
Internal GPP Bridge to Bus [C:A] [1022:14eb]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus
Controller [1022:790b] (rev 71)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC
Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 0 [1022:14f0]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 1 [1022:14f1]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 2 [1022:14f2]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 3 [1022:14f3]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 4 [1022:14f4]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 5 [1022:14f5]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 6 [1022:14f6]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix
Data Fabric; Function 7 [1022:14f7]
00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI]
Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/
ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro
Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M
XT/7600S/7700S / PRO W7600] [1002:7480] (rev c7)
00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc.
[AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller
[10ec:8168] (rev 15)
00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi
6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics
Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices,
Inc. [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
Rembrandt Radeon High Definition Audio Controller [1002:1640]
00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc.
[AMD] Phoenix CCP/PSP 3.0 Device [1022:15c7]
00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Device [1022:15b9]
00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Device [1022:15ba]
00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc.
[AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
Family 17h/19h/1ah HD Audio Controller [1022:15e3]
00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro
Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
00:08.2/00.1 Signal processing controller [1180]: Advanced Micro
Devices, Inc. [AMD] AMD IPU Device [1022:1502]
00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro
Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Device [1022:15c0]
00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Device [1022:15c1]
00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
Pink Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
That's why 7d08f21f8c630 intentionally matches the NHI and then
changes the root port right above that instead of all the root ports -
because that is where the problem was.
For some reason it doesn't seem to trigger on my system (added debug
output) I will look further into it why that happens.
You never see this message in your logs at suspend time (even on a
"fixed" BIOS)?
"quirk: disabling D3cold for suspend"
I'm /suspecting/ you do see it, but you're having problems with another
root port.
I mentioned this in my previous iterations of patches that eventually
landed on that quirk, but Windows and Linux handle root ports
differently at suspend time and that could be why it's exposing your
BIOS bug.
If you can please narrow down which root ports actually need the quirk
for your side (feel free to do a similar style to 7d08f21f8c630) I think
we could land on something more narrow and upstreamable.
At a minimum what you're doing today is covering both Rembrandt and
Phoenix and it should only apply to Phoenix.