Hello,
I am having an intermittent problem with resuming from s2idle. There
seems to be a problem with going into the s2idle state--the laptop
appears suspended, but the power draw is high and laptop remains warm
over time. Attempting to resume fails; I need to fully power off the
laptop.
Can somebody please help me troubleshoot this? I am able to test
patches and experiment, but I'm out of my depth with trying to figure
this out on my own.
If there is a better place to ask this, please let me know.
I first posted about the problem here:
https://community.frame.work/t/linux-framework-16-intermittent-failure-to-resume-from-suspend/58674
System details are:
* Framework Laptop 16 (without GPU module)
* Ryzen 7 7840HS
* Debian Sid
The kernel I had trouble with was 6.10.6; I have just recently updated
the kernel to git 09f6b0c8904bfaa1e0601bc102e1b6aa6de8c98f (from
yesterday) in order to try to troubleshoot further.
I tried to find some debugging information on my own. The remainder
of this message is about that effort, but if I'm on the wrong track,
please disregard the following.
I found this article:
https://www.phoronix.com/news/AMD-MP2-STB-Suspend-Resume
...and hoped I would be able to find some useful information.
As far as I can tell from the code, I need to load the amd_pmc module
with enable_stb=1.
lizard:~# rmmod amd_pmc
lizard:~# modprobe amd_pmc enable_stb=1
If I do that, though:
1. There is an error: 'amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff'
2. There is a kernel WARNING (which I will paste in full below):
ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff
3. The expected files in debugfs do not appear.
I added some printk statements to the driver in order to try to find
out what is happening.
The trouble seems to be in amd_pmc_s2d_init() and the results it gets
back from calling amd_pmc_send_cmd()
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/amd/pmc/pmc.c#n978
/* Get DRAM size */
ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true);
printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret);
if (ret || !dev->dram_size)
dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX;
/* Get STB DRAM address */
amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_LOW, &phys_addr_low, dev->s2d_msg_id, true);
amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_HIGH, &phys_addr_hi, dev->s2d_msg_id, true);
For the call to retrieve S2D_DRAM_SIZE, the return value is -5.
For the calls to retrieve S2D_PHYS_ADDR_LOW, the return value is 0,
but phys_addr_low is 0 as well, which seems wrong.
For S2D_PHYS_ADDR_HIGH, phys_addr_hi is 0 is well.
I think that both of the phys_addr values being 0 is resulting in the
warning from ioremap.
Is this a driver bug, or a hardware limitation?
I will post my debug patch below and then the output from the kernel
when loading 'amd_pmc enable_stb=1'.
----------------------------------------------------------------------
commit ed7a2784cf6a19796734b8aca87a260c4ff1f752
Author: Corey Hickey <bugfood-c@xxxxxxxxxx>
Date: Fri 2024-10-11 23:13:40
debug
diff --git a/drivers/platform/x86/amd/pmc/mp2_stb.c b/drivers/platform/x86/amd/pmc/mp2_stb.c
index 9775ddc1b27a..718b01266bff 100644
--- a/drivers/platform/x86/amd/pmc/mp2_stb.c
+++ b/drivers/platform/x86/amd/pmc/mp2_stb.c
@@ -228,10 +228,12 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
struct pci_dev *pdev;
int rc;
+ printk(KERN_INFO "amd_mp2_stb_init 1\n");
mp2 = devm_kzalloc(dev->dev, sizeof(*mp2), GFP_KERNEL);
if (!mp2)
return;
+ printk(KERN_INFO "amd_mp2_stb_init 2\n");
pdev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MP2_STB, NULL);
if (!pdev)
return;
@@ -239,24 +241,28 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
dev->mp2 = mp2;
mp2->pdev = pdev;
+ printk(KERN_INFO "amd_mp2_stb_init 3");
mp2->devres_gid = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
if (!mp2->devres_gid) {
dev_err(&pdev->dev, "devres_open_group failed\n");
goto mp2_error;
}
+ printk(KERN_INFO "amd_mp2_stb_init 4\n");
rc = pcim_enable_device(pdev);
if (rc) {
dev_err(&pdev->dev, "pcim_enable_device failed\n");
goto mp2_error;
}
+ printk(KERN_INFO "amd_mp2_stb_init 5\n");
rc = pcim_iomap_regions(pdev, BIT(MP2_MMIO_BAR), "mp2 stb");
if (rc) {
dev_err(&pdev->dev, "pcim_iomap_regions failed\n");
goto mp2_error;
}
+ printk(KERN_INFO "amd_mp2_stb_init 6\n");
mp2->mmio = pcim_iomap_table(pdev)[MP2_MMIO_BAR];
if (!mp2->mmio) {
dev_err(&pdev->dev, "pcim_iomap_table failed\n");
@@ -265,6 +271,7 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
pci_set_master(pdev);
+ printk(KERN_INFO "amd_mp2_stb_init 7\n");
rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (rc) {
dev_err(&pdev->dev, "failed to set DMA mask\n");
diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c
index bbb8edb62e00..6ca497473d78 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -627,6 +627,7 @@ static void amd_pmc_dbgfs_unregister(struct amd_pmc_dev *dev)
static bool amd_pmc_is_stb_supported(struct amd_pmc_dev *dev)
{
+ printk(KERN_INFO "amd_pmc_is_stb_supported cpu_id: %d\n", dev->cpu_id);
switch (dev->cpu_id) {
case AMD_CPU_ID_YC:
case AMD_CPU_ID_CB:
@@ -986,11 +987,13 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev)
dev->msg_port = 1;
amd_pmc_send_cmd(dev, S2D_TELEMETRY_SIZE, &size, dev->s2d_msg_id, true);
+ printk(KERN_INFO "amd_pmc_s2d_init size: %u\n", size);
if (size != S2D_TELEMETRY_BYTES_MAX)
return -EIO;
/* Get DRAM size */
ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true);
+ printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret);
if (ret || !dev->dram_size)
dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX;
@@ -1003,7 +1006,9 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev)
/* Clear msg_port for other SMU operation */
dev->msg_port = 0;
+ printk(KERN_INFO "amd_pmc_s2d_init p_a_l: %u p_a_hi: %u s_p_a: %llu sz: %u\n", phys_addr_low, phys_addr_hi, stb_phys_addr, dev->dram_size);
dev->stb_virt_addr = devm_ioremap(dev->dev, stb_phys_addr, dev->dram_size);
+ printk(KERN_INFO "amd_pmc_s2d_init dsva: %p\n", dev->stb_virt_addr);
if (!dev->stb_virt_addr)
return -ENOMEM;
@@ -1047,6 +1052,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
int err;
u32 val;
+ printk(KERN_INFO "amd_pmc_probe: 1\n");
dev->dev = &pdev->dev;
rdev = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0));
@@ -1057,12 +1063,14 @@ static int amd_pmc_probe(struct platform_device *pdev)
dev->cpu_id = rdev->device;
+ printk(KERN_INFO "amd_pmc_probe: 2\n");
if (dev->cpu_id == AMD_CPU_ID_SP) {
dev_warn_once(dev->dev, "S0i3 is not supported on this hardware\n");
err = -ENODEV;
goto err_pci_dev_put;
}
+ printk(KERN_INFO "amd_pmc_probe: 3\n");
dev->rdev = rdev;
err = amd_smn_read(0, AMD_PMC_BASE_ADDR_LO, &val);
if (err) {
@@ -1073,6 +1081,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
base_addr_lo = val & AMD_PMC_BASE_ADDR_HI_MASK;
+ printk(KERN_INFO "amd_pmc_probe: 4\n");
err = amd_smn_read(0, AMD_PMC_BASE_ADDR_HI, &val);
if (err) {
dev_err(dev->dev, "error reading 0x%x\n", AMD_PMC_BASE_ADDR_HI);
@@ -1085,6 +1094,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
dev->regbase = devm_ioremap(dev->dev, base_addr + AMD_PMC_BASE_ADDR_OFFSET,
AMD_PMC_MAPPING_SIZE);
+ printk(KERN_INFO "amd_pmc_probe: 5\n");
if (!dev->regbase) {
err = -ENOMEM;
goto err_pci_dev_put;
@@ -1095,24 +1105,31 @@ static int amd_pmc_probe(struct platform_device *pdev)
/* Get num of IP blocks within the SoC */
amd_pmc_get_ip_info(dev);
+ printk(KERN_INFO "amd_pmc_probe: 6\n");
if (enable_stb && amd_pmc_is_stb_supported(dev)) {
err = amd_pmc_s2d_init(dev);
+ printk(KERN_INFO "amd_pmc_probe: 6a\n");
if (err)
goto err_pci_dev_put;
}
+ printk(KERN_INFO "amd_pmc_probe: 7\n");
platform_set_drvdata(pdev, dev);
if (IS_ENABLED(CONFIG_SUSPEND)) {
err = acpi_register_lps0_dev(&amd_pmc_s2idle_dev_ops);
+ printk(KERN_INFO "amd_pmc_probe: 7a\n");
if (err)
dev_warn(dev->dev, "failed to register LPS0 sleep handler, expect increased power consumption\n");
if (!disable_workarounds)
amd_pmc_quirks_init(dev);
}
+ printk(KERN_INFO "amd_pmc_probe: 8\n");
amd_pmc_dbgfs_register(dev);
- if (IS_ENABLED(CONFIG_AMD_MP2_STB))
+ if (IS_ENABLED(CONFIG_AMD_MP2_STB)) {
+ printk(KERN_INFO "amd_pmc_probe: calling amd_mp2_stb_init\n");
amd_mp2_stb_init(dev);
+ }
pm_report_max_hw_sleep(U64_MAX);
return 0;
----------------------------------------------------------------------
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 1
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 2
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 3
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 4
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 5
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6
Oct 12 00:20:01 lizard kernel: amd_pmc_is_stb_supported cpu_id: 5352
Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init size: 1048576
Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff
Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init s2d_dram_size ret: -5
Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init p_a_l: 0 p_a_hi: 0 s_p_a: 0 sz: 16777216
Oct 12 00:20:01 lizard kernel: ------------[ cut here ]------------
Oct 12 00:20:01 lizard kernel: ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff
Oct 12 00:20:01 lizard kernel: WARNING: CPU: 10 PID: 2151 at arch/x86/mm/ioremap.c:217 __ioremap_caller+0x2cd/0x340
Oct 12 00:20:01 lizard kernel: Modules linked in: amd_pmc(+) ccm cpufreq_userspace cpufreq_powersave cpufreq_conservative sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat typec_displayport amdgpu snd_sof_amd_rembrandt amdxcp drm_exec snd_sof_amd_acp gpu_sched btusb snd_sof_pci drm_buddy snd_sof_xtensa_dsp btrtl drm_suballoc_helper snd_hda_codec_realtek amd_atl drm_display_helper btintel intel_rapl_msr snd_sof btbcm intel_rapl_common snd_hda_codec_generic snd_sof_utils cec btmtk snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core uvcvideo mt7921e snd_compress videobuf2_vmalloc rc_core snd_pcm_dmaengine uvc snd_hda_intel mt7921_common drm_ttm_helper videobuf2_memops snd_pci_ps snd_intel_dspcfg snd_rpl_pci_acp6x snd_intel_sdw_acpi mt792x_lib videobuf2_v4l2 snd_pci_acp6x edac_mce_amd ttm snd_pci_acp5x mt76_connac_lib snd_hda_codec snd_rn_pci_acp3x videodev bluetooth drm_kms_helper snd_acp_config mt76 snd_hda_core videobuf2_common snd_soc_acpi i2c_algo_bit mc crc16 snd_hwdep snd_pci_acp3x amd_pmf kvm_amd amdtee mac80211 hid_sensor_als
Oct 12 00:20:01 lizard kernel: hid_sensor_trigger ccp libarc4 ucsi_acpi hid_sensor_iio_common kvm industrialio_triggered_buffer amd_sfh typec_ucsi kfifo_buf leds_cros_ec cros_usbpd_charger tee typec snd_pcsp cros_ec_hwmon platform_profile led_class_multicolor rapl cros_usbpd_notify cfg80211 cros_ec_sysfs industrialio roles cros_usbpd_logger cros_ec_debugfs cros_charge_control cros_ec_chardev wmi_bmof sp5100_tco button ac k10temp watchdog rfkill cpufreq_ondemand snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore evdev i2c_dev sidewinder gameport joydev parport_pc ppdev lp parport efi_pstore configfs nfnetlink ip_tables x_tables autofs4 xfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 cdc_ncm cdc_ether usbnet r8152 mii libphy usbhid raid1 hid_multitouch hid_sensor_hub hid_generic md_mod crct10dif_pclmul i2c_hid_acpi crc32_pclmul xhci_pci i2c_hid crc32c_intel cros_ec_dev xhci_hcd ghash_clmulni_intel
Oct 12 00:20:01 lizard kernel: cros_ec_lpcs sha512_ssse3 cros_ec nvme sha256_ssse3 usbcore drm thunderbolt sha1_ssse3 i2c_piix4 video nvme_core i2c_smbus usb_common battery wmi hid aesni_intel gf128mul crypto_simd cryptd [last unloaded: amd_pmc]
Oct 12 00:20:01 lizard kernel: CPU: 10 UID: 0 PID: 2151 Comm: modprobe Not tainted 6.12.0-rc2+ #8
Oct 12 00:20:01 lizard kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.03 03/27/2024
Oct 12 00:20:01 lizard kernel: RIP: 0010:__ioremap_caller+0x2cd/0x340
Oct 12 00:20:01 lizard kernel: Code: 1f fe ff ff 80 3d 7e 33 d8 01 00 75 9d 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 9f ae 6c 82 c6 05 64 33 d8 01 01 e8 53 d8 01 00 <0f> 0b e9 79 ff ff ff 83 fd 04 75 35 bf 04 00 00 00 e8 ad a0 ff ff
Oct 12 00:20:01 lizard kernel: RSP: 0018:ffffaee502663a18 EFLAGS: 00010282
Oct 12 00:20:01 lizard kernel: RAX: 0000000000000000 RBX: ffff8fe3066563e8 RCX: 0000000000000027
Oct 12 00:20:01 lizard kernel: RDX: ffff8ff1dff21788 RSI: 0000000000000001 RDI: ffff8ff1dff21780
Oct 12 00:20:01 lizard kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: ffffaee502663898
Oct 12 00:20:01 lizard kernel: R10: ffffffff82eb40e8 R11: 0000000000000003 R12: 0000000001000000
Oct 12 00:20:01 lizard kernel: R13: 0000000001000000 R14: 0000000000000000 R15: 0000000000000000
Oct 12 00:20:01 lizard kernel: FS: 00007f5e2c747640(0000) GS:ffff8ff1dff00000(0000) knlGS:0000000000000000
Oct 12 00:20:01 lizard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 12 00:20:01 lizard kernel: CR2: 00007f5e2bf7e700 CR3: 000000010d2a2000 CR4: 0000000000750ef0
Oct 12 00:20:01 lizard kernel: PKRU: 55555554
Oct 12 00:20:01 lizard kernel: Call Trace:
Oct 12 00:20:01 lizard kernel: <TASK>
Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340
Oct 12 00:20:01 lizard kernel: ? __warn.cold+0x93/0xf6
Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340
Oct 12 00:20:01 lizard kernel: ? report_bug+0xff/0x140
Oct 12 00:20:01 lizard kernel: ? console_unlock+0x9d/0x140
Oct 12 00:20:01 lizard kernel: ? handle_bug+0x58/0x90
Oct 12 00:20:01 lizard kernel: ? exc_invalid_op+0x17/0x70
Oct 12 00:20:01 lizard kernel: ? asm_exc_invalid_op+0x1a/0x20
Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340
Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340
Oct 12 00:20:01 lizard kernel: ? devm_ioremap+0x49/0x80
Oct 12 00:20:01 lizard kernel: ? __pfx_devm_ioremap_release+0x10/0x10
Oct 12 00:20:01 lizard kernel: devm_ioremap+0x49/0x80
Oct 12 00:20:01 lizard kernel: amd_pmc_probe+0x41a/0x5ac [amd_pmc]
Oct 12 00:20:01 lizard kernel: platform_probe+0x41/0xa0
Oct 12 00:20:01 lizard kernel: really_probe+0xdb/0x340
Oct 12 00:20:01 lizard kernel: ? pm_runtime_barrier+0x54/0x90
Oct 12 00:20:01 lizard kernel: ? __pfx___driver_attach+0x10/0x10
Oct 12 00:20:01 lizard kernel: __driver_probe_device+0x78/0x110
Oct 12 00:20:01 lizard kernel: driver_probe_device+0x1f/0xa0
Oct 12 00:20:01 lizard kernel: __driver_attach+0xba/0x1c0
Oct 12 00:20:01 lizard kernel: bus_for_each_dev+0x8c/0xe0
Oct 12 00:20:01 lizard kernel: bus_add_driver+0x112/0x1f0
Oct 12 00:20:01 lizard kernel: driver_register+0x72/0xd0
Oct 12 00:20:01 lizard kernel: ? __pfx_amd_pmc_driver_init+0x10/0x10 [amd_pmc]
Oct 12 00:20:01 lizard kernel: do_one_initcall+0x58/0x310
Oct 12 00:20:01 lizard kernel: do_init_module+0x60/0x230
Oct 12 00:20:01 lizard kernel: init_module_from_file+0x86/0xc0
Oct 12 00:20:01 lizard kernel: idempotent_init_module+0x11e/0x310
Oct 12 00:20:01 lizard kernel: __x64_sys_finit_module+0x5e/0xb0
Oct 12 00:20:01 lizard kernel: do_syscall_64+0x82/0x190
Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel: ? __count_memcg_events+0x53/0xf0
Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel: ? count_memcg_events.constprop.0+0x1a/0x30
Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel: ? handle_mm_fault+0x1bb/0x2c0
Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel: ? do_user_addr_fault+0x36c/0x620
Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 12 00:20:01 lizard kernel: RIP: 0033:0x7f5e2bf1b0e9
Oct 12 00:20:01 lizard kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 5c 0d 00 f7 d8 64 89 01 48
Oct 12 00:20:01 lizard kernel: RSP: 002b:00007ffdc1020768 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Oct 12 00:20:01 lizard kernel: RAX: ffffffffffffffda RBX: 000055bc6cec2e70 RCX: 00007f5e2bf1b0e9
Oct 12 00:20:01 lizard kernel: RDX: 0000000000000000 RSI: 000055bc6cec3220 RDI: 0000000000000003
Oct 12 00:20:01 lizard kernel: RBP: 0000000000000000 R08: 00007f5e2bff1b20 R09: 0000000000000000
Oct 12 00:20:01 lizard kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 000055bc6cec3220
Oct 12 00:20:01 lizard kernel: R13: 0000000000040000 R14: 000055bc6cec2f10 R15: 0000000000000000
Oct 12 00:20:01 lizard kernel: </TASK>
Oct 12 00:20:01 lizard kernel: ---[ end trace 0000000000000000 ]---
Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init dsva: 0000000000000000
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6a
Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: probe with driver amd_pmc failed with error -12
----------------------------------------------------------------------
Thank you,
Corey