[AMD Official Use Only - AMD Internal Distribution Only] +Shyam -----Original Message----- From: Corey Hickey <bugfood-ml@xxxxxxxxxx> Sent: Sunday, October 13, 2024 3:42 AM To: platform-driver-x86@xxxxxxxxxxxxxxx Subject: please help with intermittent s2idle problem on AMD laptop Hello, I am having an intermittent problem with resuming from s2idle. There seems to be a problem with going into the s2idle state--the laptop appears suspended, but the power draw is high and laptop remains warm over time. Attempting to resume fails; I need to fully power off the laptop. Can somebody please help me troubleshoot this? I am able to test patches and experiment, but I'm out of my depth with trying to figure this out on my own. If there is a better place to ask this, please let me know. I first posted about the problem here: https://community.frame.work/t/linux-framework-16-intermittent-failure-to-resume-from-suspend/58674 System details are: * Framework Laptop 16 (without GPU module) * Ryzen 7 7840HS * Debian Sid The kernel I had trouble with was 6.10.6; I have just recently updated the kernel to git 09f6b0c8904bfaa1e0601bc102e1b6aa6de8c98f (from yesterday) in order to try to troubleshoot further. I tried to find some debugging information on my own. The remainder of this message is about that effort, but if I'm on the wrong track, please disregard the following. I found this article: https://www.phoronix.com/news/AMD-MP2-STB-Suspend-Resume ...and hoped I would be able to find some useful information. As far as I can tell from the code, I need to load the amd_pmc module with enable_stb=1. lizard:~# rmmod amd_pmc lizard:~# modprobe amd_pmc enable_stb=1 If I do that, though: 1. There is an error: 'amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff' 2. There is a kernel WARNING (which I will paste in full below): ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff 3. The expected files in debugfs do not appear. I added some printk statements to the driver in order to try to find out what is happening. The trouble seems to be in amd_pmc_s2d_init() and the results it gets back from calling amd_pmc_send_cmd() https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/amd/pmc/pmc.c#n978 /* Get DRAM size */ ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true); printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret); if (ret || !dev->dram_size) dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX; /* Get STB DRAM address */ amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_LOW, &phys_addr_low, dev->s2d_msg_id, true); amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_HIGH, &phys_addr_hi, dev->s2d_msg_id, true); For the call to retrieve S2D_DRAM_SIZE, the return value is -5. For the calls to retrieve S2D_PHYS_ADDR_LOW, the return value is 0, but phys_addr_low is 0 as well, which seems wrong. For S2D_PHYS_ADDR_HIGH, phys_addr_hi is 0 is well. I think that both of the phys_addr values being 0 is resulting in the warning from ioremap. Is this a driver bug, or a hardware limitation? I will post my debug patch below and then the output from the kernel when loading 'amd_pmc enable_stb=1'. ---------------------------------------------------------------------- commit ed7a2784cf6a19796734b8aca87a260c4ff1f752 Author: Corey Hickey <bugfood-c@xxxxxxxxxx> Date: Fri 2024-10-11 23:13:40 debug diff --git a/drivers/platform/x86/amd/pmc/mp2_stb.c b/drivers/platform/x86/amd/pmc/mp2_stb.c index 9775ddc1b27a..718b01266bff 100644 --- a/drivers/platform/x86/amd/pmc/mp2_stb.c +++ b/drivers/platform/x86/amd/pmc/mp2_stb.c @@ -228,10 +228,12 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev) struct pci_dev *pdev; int rc; + printk(KERN_INFO "amd_mp2_stb_init 1\n"); mp2 = devm_kzalloc(dev->dev, sizeof(*mp2), GFP_KERNEL); if (!mp2) return; + printk(KERN_INFO "amd_mp2_stb_init 2\n"); pdev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MP2_STB, NULL); if (!pdev) return; @@ -239,24 +241,28 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev) dev->mp2 = mp2; mp2->pdev = pdev; + printk(KERN_INFO "amd_mp2_stb_init 3"); mp2->devres_gid = devres_open_group(&pdev->dev, NULL, GFP_KERNEL); if (!mp2->devres_gid) { dev_err(&pdev->dev, "devres_open_group failed\n"); goto mp2_error; } + printk(KERN_INFO "amd_mp2_stb_init 4\n"); rc = pcim_enable_device(pdev); if (rc) { dev_err(&pdev->dev, "pcim_enable_device failed\n"); goto mp2_error; } + printk(KERN_INFO "amd_mp2_stb_init 5\n"); rc = pcim_iomap_regions(pdev, BIT(MP2_MMIO_BAR), "mp2 stb"); if (rc) { dev_err(&pdev->dev, "pcim_iomap_regions failed\n"); goto mp2_error; } + printk(KERN_INFO "amd_mp2_stb_init 6\n"); mp2->mmio = pcim_iomap_table(pdev)[MP2_MMIO_BAR]; if (!mp2->mmio) { dev_err(&pdev->dev, "pcim_iomap_table failed\n"); @@ -265,6 +271,7 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev) pci_set_master(pdev); + printk(KERN_INFO "amd_mp2_stb_init 7\n"); rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); if (rc) { dev_err(&pdev->dev, "failed to set DMA mask\n"); diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c index bbb8edb62e00..6ca497473d78 100644 --- a/drivers/platform/x86/amd/pmc/pmc.c +++ b/drivers/platform/x86/amd/pmc/pmc.c @@ -627,6 +627,7 @@ static void amd_pmc_dbgfs_unregister(struct amd_pmc_dev *dev) static bool amd_pmc_is_stb_supported(struct amd_pmc_dev *dev) { + printk(KERN_INFO "amd_pmc_is_stb_supported cpu_id: %d\n", + dev->cpu_id); switch (dev->cpu_id) { case AMD_CPU_ID_YC: case AMD_CPU_ID_CB: @@ -986,11 +987,13 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev) dev->msg_port = 1; amd_pmc_send_cmd(dev, S2D_TELEMETRY_SIZE, &size, dev->s2d_msg_id, true); + printk(KERN_INFO "amd_pmc_s2d_init size: %u\n", size); if (size != S2D_TELEMETRY_BYTES_MAX) return -EIO; /* Get DRAM size */ ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true); + printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret); if (ret || !dev->dram_size) dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX; @@ -1003,7 +1006,9 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev) /* Clear msg_port for other SMU operation */ dev->msg_port = 0; + printk(KERN_INFO "amd_pmc_s2d_init p_a_l: %u p_a_hi: %u s_p_a: %llu +sz: %u\n", phys_addr_low, phys_addr_hi, stb_phys_addr, dev->dram_size); dev->stb_virt_addr = devm_ioremap(dev->dev, stb_phys_addr, dev->dram_size); + printk(KERN_INFO "amd_pmc_s2d_init dsva: %p\n", dev->stb_virt_addr); if (!dev->stb_virt_addr) return -ENOMEM; @@ -1047,6 +1052,7 @@ static int amd_pmc_probe(struct platform_device *pdev) int err; u32 val; + printk(KERN_INFO "amd_pmc_probe: 1\n"); dev->dev = &pdev->dev; rdev = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0)); @@ -1057,12 +1063,14 @@ static int amd_pmc_probe(struct platform_device *pdev) dev->cpu_id = rdev->device; + printk(KERN_INFO "amd_pmc_probe: 2\n"); if (dev->cpu_id == AMD_CPU_ID_SP) { dev_warn_once(dev->dev, "S0i3 is not supported on this hardware\n"); err = -ENODEV; goto err_pci_dev_put; } + printk(KERN_INFO "amd_pmc_probe: 3\n"); dev->rdev = rdev; err = amd_smn_read(0, AMD_PMC_BASE_ADDR_LO, &val); if (err) { @@ -1073,6 +1081,7 @@ static int amd_pmc_probe(struct platform_device *pdev) base_addr_lo = val & AMD_PMC_BASE_ADDR_HI_MASK; + printk(KERN_INFO "amd_pmc_probe: 4\n"); err = amd_smn_read(0, AMD_PMC_BASE_ADDR_HI, &val); if (err) { dev_err(dev->dev, "error reading 0x%x\n", AMD_PMC_BASE_ADDR_HI); @@ -1085,6 +1094,7 @@ static int amd_pmc_probe(struct platform_device *pdev) dev->regbase = devm_ioremap(dev->dev, base_addr + AMD_PMC_BASE_ADDR_OFFSET, AMD_PMC_MAPPING_SIZE); + printk(KERN_INFO "amd_pmc_probe: 5\n"); if (!dev->regbase) { err = -ENOMEM; goto err_pci_dev_put; @@ -1095,24 +1105,31 @@ static int amd_pmc_probe(struct platform_device *pdev) /* Get num of IP blocks within the SoC */ amd_pmc_get_ip_info(dev); + printk(KERN_INFO "amd_pmc_probe: 6\n"); if (enable_stb && amd_pmc_is_stb_supported(dev)) { err = amd_pmc_s2d_init(dev); + printk(KERN_INFO "amd_pmc_probe: 6a\n"); if (err) goto err_pci_dev_put; } + printk(KERN_INFO "amd_pmc_probe: 7\n"); platform_set_drvdata(pdev, dev); if (IS_ENABLED(CONFIG_SUSPEND)) { err = acpi_register_lps0_dev(&amd_pmc_s2idle_dev_ops); + printk(KERN_INFO "amd_pmc_probe: 7a\n"); if (err) dev_warn(dev->dev, "failed to register LPS0 sleep handler, expect increased power consumption\n"); if (!disable_workarounds) amd_pmc_quirks_init(dev); } + printk(KERN_INFO "amd_pmc_probe: 8\n"); amd_pmc_dbgfs_register(dev); - if (IS_ENABLED(CONFIG_AMD_MP2_STB)) + if (IS_ENABLED(CONFIG_AMD_MP2_STB)) { + printk(KERN_INFO "amd_pmc_probe: calling amd_mp2_stb_init\n"); amd_mp2_stb_init(dev); + } pm_report_max_hw_sleep(U64_MAX); return 0; ---------------------------------------------------------------------- Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 1 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 2 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 3 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 4 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 5 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6 Oct 12 00:20:01 lizard kernel: amd_pmc_is_stb_supported cpu_id: 5352 Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init size: 1048576 Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init s2d_dram_size ret: -5 Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init p_a_l: 0 p_a_hi: 0 s_p_a: 0 sz: 16777216 Oct 12 00:20:01 lizard kernel: ------------[ cut here ]------------ Oct 12 00:20:01 lizard kernel: ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff Oct 12 00:20:01 lizard kernel: WARNING: CPU: 10 PID: 2151 at arch/x86/mm/ioremap.c:217 __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: Modules linked in: amd_pmc(+) ccm cpufreq_userspace cpufreq_powersave cpufreq_conservative sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat typec_displayport amdgpu snd_sof_amd_rembrandt amdxcp drm_exec snd_sof_amd_acp gpu_sched btusb snd_sof_pci drm_buddy snd_sof_xtensa_dsp btrtl drm_suballoc_helper snd_hda_codec_realtek amd_atl drm_display_helper btintel intel_rapl_msr snd_sof btbcm intel_rapl_common snd_hda_codec_generic snd_sof_utils cec btmtk snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core uvcvideo mt7921e snd_compress videobuf2_vmalloc rc_core snd_pcm_dmaengine uvc snd_hda_intel mt7921_common drm_ttm_helper videobuf2_memops snd_pci_ps snd_intel_dspcfg snd_rpl_pci_acp6x snd_intel_sdw_acpi mt792x_lib videobuf2_v4l2 snd_pci_acp6x edac_mce_amd ttm snd_pci_acp5x mt76_connac_lib snd_hda_codec snd_rn_pci_acp3x videodev bluetooth drm_kms_helper snd_acp_config mt76 snd_hda_core videobuf2_common snd_soc_acpi i2c_algo_bit mc crc16 snd_hwdep snd_pci_acp3x amd_pmf kvm_amd amdtee mac80211 hid_sensor_als Oct 12 00:20:01 lizard kernel: hid_sensor_trigger ccp libarc4 ucsi_a cpi hid_sensor_iio_common kvm industrialio_triggered_buffer amd_sfh typec_ucsi kfifo_buf leds_cros_ec cros_usbpd_charger tee typec snd_pcsp cros_ec_hwmon platform_profile led_class_multicolor rapl cros_usbpd_notify cfg80211 cros_ec_sysfs industrialio roles cros_usbpd_logger cros_ec_debugfs cros_charge_control cros_ec_chardev wmi_bmof sp5100_tco button ac k10temp watchdog rfkill cpufreq_ondemand snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore evdev i2c_dev sidewinder gameport joydev parport_pc ppdev lp parport efi_pstore configfs nfnetlink ip_tables x_tables autofs4 xfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 cdc_ncm cdc_ether usbnet r8152 mii libphy usbhid raid1 hid_multitouch hid_sensor_hub hid_generic md_mod crct10dif_pclmul i2c_hid_acpi crc32_pclmul xhci_pci i2c_hid crc32c_intel cros_ec_dev xhci_hcd ghash_clmulni_intel Oct 12 00:20:01 lizard kernel: cros_ec_lpcs sha512_ssse3 cros_ec nvme sha256_ssse3 usbcore drm thunderbolt sha1_ssse3 i2c_piix4 video nvme_core i2c_smbus usb_common battery wmi hid aesni_intel gf128mul crypto_simd cryptd [last unloaded: amd_pmc] Oct 12 00:20:01 lizard kernel: CPU: 10 UID: 0 PID: 2151 Comm: modprobe Not tainted 6.12.0-rc2+ #8 Oct 12 00:20:01 lizard kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.03 03/27/2024 Oct 12 00:20:01 lizard kernel: RIP: 0010:__ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: Code: 1f fe ff ff 80 3d 7e 33 d8 01 00 75 9d 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 9f ae 6c 82 c6 05 64 33 d8 01 01 e8 53 d8 01 00 <0f> 0b e9 79 ff ff ff 83 fd 04 75 35 bf 04 00 00 00 e8 ad a0 ff ff Oct 12 00:20:01 lizard kernel: RSP: 0018:ffffaee502663a18 EFLAGS: 00010282 Oct 12 00:20:01 lizard kernel: RAX: 0000000000000000 RBX: ffff8fe3066563e8 RCX: 0000000000000027 Oct 12 00:20:01 lizard kernel: RDX: ffff8ff1dff21788 RSI: 0000000000000001 RDI: ffff8ff1dff21780 Oct 12 00:20:01 lizard ke rnel: RBP: 0000000000000002 R08: 0000000000000000 R09: ffffaee502663898 Oct 12 00:20:01 lizard kernel: R10: ffffffff82eb40e8 R11: 0000000000000003 R12: 0000000001000000 Oct 12 00:20:01 lizard kernel: R13: 0000000001000000 R14: 0000000000000000 R15: 0000000000000000 Oct 12 00:20:01 lizard kernel: FS: 00007f5e2c747640(0000) GS:ffff8ff1dff00000(0000) knlGS:0000000000000000 Oct 12 00:20:01 lizard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 12 00:20:01 lizard kernel: CR2: 00007f5e2bf7e700 CR3: 000000010d2a2000 CR4: 0000000000750ef0 Oct 12 00:20:01 lizard kernel: PKRU: 55555554 Oct 12 00:20:01 lizard kernel: Call Trace: Oct 12 00:20:01 lizard kernel: <TASK> Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: ? __warn.cold+0x93/0xf6 Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: ? report_bug+0xff/0x140 Oct 12 00:20:01 lizard kernel: ? console_unlock+0x9d/0x140 Oct 12 00:20:01 lizard kernel: ? handle_bug+0x58/0x90 Oct 12 00:20:01 lizard kernel: ? exc_invalid_op+0x17/0x70 Oct 12 00:20:01 lizard kernel: ? asm_exc_invalid_op+0x1a/0x20 Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: ? devm_ioremap+0x49/0x80 Oct 12 00:20:01 lizard kernel: ? __pfx_devm_ioremap_release+0x10/0x10 Oct 12 00:20:01 lizard kernel: devm_ioremap+0x49/0x80 Oct 12 00:20:01 lizard kernel: amd_pmc_probe+0x41a/0x5ac [amd_pmc] Oct 12 00:20:01 lizard kernel: platform_probe+0x41/0xa0 Oct 12 00:20:01 lizard kernel: really_probe+0xdb/0x340 Oct 12 00:20:01 lizard kernel: ? pm_runtime_barrier+0x54/0x90 Oct 12 00:20:01 lizard kernel: ? __pfx___driver_attach+0x10/0x10 Oct 12 00:20:01 lizard kernel: __driver_probe_device+0x78/0x110 Oct 12 00:20:01 lizard kernel: driver_probe_device+0x1f/0xa0 Oct 12 00:20:01 lizard kernel: __driver_attach+0xba/0x1c0 Oct 12 00:20:01 lizard kernel: bus_for_each_dev+0x8c/0xe0 Oct 12 00:20:01 lizard kernel: bus_add_driver+0x112/0x1f0 Oct 12 00:20:01 lizard kernel: driver_register+0x72/0xd0 Oct 12 00:20:01 lizard kernel: ? __pfx_amd_pmc_driver_init+0x10/0x10 [amd_pmc] Oct 12 00:20:01 lizard kernel: do_one_initcall+0x58/0x310 Oct 12 00:20:01 lizard kernel: do_init_module+0x60/0x230 Oct 12 00:20:01 lizard kernel: init_module_from_file+0x86/0xc0 Oct 12 00:20:01 lizard kernel: idempotent_init_module+0x11e/0x310 Oct 12 00:20:01 lizard kernel: __x64_sys_finit_module+0x5e/0xb0 Oct 12 00:20:01 lizard kernel: do_syscall_64+0x82/0x190 Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Oct 12 00:20:01 lizard kernel: ? __count_memcg_events+0x53/0xf0 Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Oct 12 00:20:01 lizard kernel: ? count_memcg_events.constprop.0+0x1a/0x30 Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Oct 12 00:20:01 lizard kernel: ? handle_mm_fault+0x1bb/0x2c0 Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Oct 12 00:20:01 lizard kernel: ? do_user_addr_fault+0x36c/0x620 Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Oct 12 00:20:01 lizard kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Oct 12 00:20:01 lizard kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Oct 12 00:20:01 lizard kernel: RIP: 0033:0x7f5e2bf1b0e9 Oct 12 00:20:01 lizard kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 5c 0d 00 f7 d8 64 89 01 48 Oct 12 00:20:01 lizard kernel: RSP: 002b:00007ffdc1020768 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Oct 12 00:20:01 lizard kernel: RAX: ffffffffffffffda RBX: 000055bc6cec2e70 RCX: 00007f5e2bf1b0e9 Oct 12 00:20:01 lizard kernel: RDX: 0000000000000000 RSI: 000055bc6cec3220 RDI: 0000000000000003 Oct 12 00:20:01 lizard kernel: RBP: 0000000000000000 R08: 00007f5e2bff1b20 R09: 0000000000000000 Oct 12 00:20:01 lizard kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 000055bc6cec3220 Oct 12 00:20:01 lizard kernel: R13: 0000000000040000 R14: 000055bc6cec2f10 R15: 0000000000000000 Oct 12 00:20:01 lizard kernel: </TASK> Oct 12 00:20:01 lizard kernel: ---[ end trace 0000000000000000 ]--- Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init dsva: 0000000000000000 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6a Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: probe with driver amd_pmc failed with error -12 ---------------------------------------------------------------------- Thank you, Corey