On 10/14/2024 12:17, Goswami, Sanket wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
+Shyam
-----Original Message-----
From: Corey Hickey <bugfood-ml@xxxxxxxxxx>
Sent: Sunday, October 13, 2024 3:42 AM
To: platform-driver-x86@xxxxxxxxxxxxxxx
Subject: please help with intermittent s2idle problem on AMD laptop
Hello,
I am having an intermittent problem with resuming from s2idle. There seems to be a problem with going into the s2idle state--the laptop appears suspended, but the power draw is high and laptop remains warm over time. Attempting to resume fails; I need to fully power off the laptop.
Can somebody please help me troubleshoot this? I am able to test patches and experiment, but I'm out of my depth with trying to figure this out on my own.
If there is a better place to ask this, please let me know.
I first posted about the problem here:
https://community.frame.work/t/linux-framework-16-intermittent-failure-to-resume-from-suspend/58674
System details are:
* Framework Laptop 16 (without GPU module)
* Ryzen 7 7840HS
* Debian Sid
The kernel I had trouble with was 6.10.6; I have just recently updated the kernel to git 09f6b0c8904bfaa1e0601bc102e1b6aa6de8c98f (from
yesterday) in order to try to troubleshoot further.
I tried to find some debugging information on my own. The remainder of this message is about that effort, but if I'm on the wrong track, please disregard the following.
I found this article:
https://www.phoronix.com/news/AMD-MP2-STB-Suspend-Resume
...and hoped I would be able to find some useful information.
As far as I can tell from the code, I need to load the amd_pmc module with enable_stb=1.
lizard:~# rmmod amd_pmc
lizard:~# modprobe amd_pmc enable_stb=1
If I do that, though:
1. There is an error: 'amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff'
this is expected as the command is not supported on PMFW loaded on
your system.
and..
ret=-5 is expected on your system, because it does not support EFR
(Enhanced Firmware Reporting).
2. There is a kernel WARNING (which I will paste in full below):
ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff 3. The expected files in debugfs do not appear.
This is happening because, the ioremap() is happening for addr 0x0.
Ideally you should have got the physical address from the mailbox
command. But that does not seem to happen.
I suspect that on your system, the STB is not enabled. Can you check
the following path to see if that helps?
AMD CBS -> SMU Debug Options -> SMU Feature Config Limits -> STB To
DRAM Log <Enabled>
If DRAM log is disabled, then that should be enabled to attempt to
take a stb log.
I added some printk statements to the driver in order to try to find out what is happening.
The trouble seems to be in amd_pmc_s2d_init() and the results it gets back from calling amd_pmc_send_cmd()
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/amd/pmc/pmc.c#n978
/* Get DRAM size */
ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true);
printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret);
if (ret || !dev->dram_size)
dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX;
/* Get STB DRAM address */
amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_LOW, &phys_addr_low, dev->s2d_msg_id, true);
amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_HIGH, &phys_addr_hi, dev->s2d_msg_id, true);
For the call to retrieve S2D_DRAM_SIZE, the return value is -5.
For the calls to retrieve S2D_PHYS_ADDR_LOW, the return value is 0, but phys_addr_low is 0 as well, which seems wrong.
For S2D_PHYS_ADDR_HIGH, phys_addr_hi is 0 is well.
I think that both of the phys_addr values being 0 is resulting in the warning from ioremap.
Is this a driver bug, or a hardware limitation?
I will post my debug patch below and then the output from the kernel when loading 'amd_pmc enable_stb=1'.
----------------------------------------------------------------------
commit ed7a2784cf6a19796734b8aca87a260c4ff1f752
Author: Corey Hickey <bugfood-c@xxxxxxxxxx>
Date: Fri 2024-10-11 23:13:40
debug
diff --git a/drivers/platform/x86/amd/pmc/mp2_stb.c b/drivers/platform/x86/amd/pmc/mp2_stb.c
index 9775ddc1b27a..718b01266bff 100644
--- a/drivers/platform/x86/amd/pmc/mp2_stb.c
+++ b/drivers/platform/x86/amd/pmc/mp2_stb.c
@@ -228,10 +228,12 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
struct pci_dev *pdev;
int rc;
No need to look at mp2_stb.c as it is meant for chromebook use-cases.
So, it will not take this path on your framework system.
Note that I have looked at your debug patch, but it may not be in the
right direction.
I would suggest:
- reload the amd_pmc driver with dyndbg
- Put the system to sleep "echo mem > /sys/power/state" and take the
dmesg logs
- get the dump of /sys/kernel/debug/amd_pmc/s0ix_stats and
/sys/kernel/debug/amd_pmc/smu_fw_info
if the dmesg and debugfs logs are not helpful, then you can enable the
BIOS settings as described above to take the STB log.
The stb log can be obtained by cat /sys/kernel/debug/amd_pmc/stb_read
stb_data.bin and please put that info on bugzilla.
Thanks,
Shyam
+ printk(KERN_INFO "amd_mp2_stb_init 1\n");
mp2 = devm_kzalloc(dev->dev, sizeof(*mp2), GFP_KERNEL);
if (!mp2)
return;
+ printk(KERN_INFO "amd_mp2_stb_init 2\n");
pdev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MP2_STB, NULL);
if (!pdev)
return;
@@ -239,24 +241,28 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
dev->mp2 = mp2;
mp2->pdev = pdev;
+ printk(KERN_INFO "amd_mp2_stb_init 3");
mp2->devres_gid = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
if (!mp2->devres_gid) {
dev_err(&pdev->dev, "devres_open_group failed\n");
goto mp2_error;
}
+ printk(KERN_INFO "amd_mp2_stb_init 4\n");
rc = pcim_enable_device(pdev);
if (rc) {
dev_err(&pdev->dev, "pcim_enable_device failed\n");
goto mp2_error;
}
+ printk(KERN_INFO "amd_mp2_stb_init 5\n");
rc = pcim_iomap_regions(pdev, BIT(MP2_MMIO_BAR), "mp2 stb");
if (rc) {
dev_err(&pdev->dev, "pcim_iomap_regions failed\n");
goto mp2_error;
}
+ printk(KERN_INFO "amd_mp2_stb_init 6\n");
mp2->mmio = pcim_iomap_table(pdev)[MP2_MMIO_BAR];
if (!mp2->mmio) {
dev_err(&pdev->dev, "pcim_iomap_table failed\n"); @@ -265,6 +271,7 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
pci_set_master(pdev);
+ printk(KERN_INFO "amd_mp2_stb_init 7\n");
rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (rc) {
dev_err(&pdev->dev, "failed to set DMA mask\n"); diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c
index bbb8edb62e00..6ca497473d78 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -627,6 +627,7 @@ static void amd_pmc_dbgfs_unregister(struct amd_pmc_dev *dev)
static bool amd_pmc_is_stb_supported(struct amd_pmc_dev *dev)
{
+ printk(KERN_INFO "amd_pmc_is_stb_supported cpu_id: %d\n",
+ dev->cpu_id);
switch (dev->cpu_id) {
case AMD_CPU_ID_YC:
case AMD_CPU_ID_CB:
@@ -986,11 +987,13 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev)
dev->msg_port = 1;
amd_pmc_send_cmd(dev, S2D_TELEMETRY_SIZE, &size, dev->s2d_msg_id, true);
+ printk(KERN_INFO "amd_pmc_s2d_init size: %u\n", size);
if (size != S2D_TELEMETRY_BYTES_MAX)
return -EIO;
/* Get DRAM size */
ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true);
+ printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret);
if (ret || !dev->dram_size)
dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX;
@@ -1003,7 +1006,9 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev)
/* Clear msg_port for other SMU operation */
dev->msg_port = 0;
+ printk(KERN_INFO "amd_pmc_s2d_init p_a_l: %u p_a_hi: %u s_p_a: %llu
+sz: %u\n", phys_addr_low, phys_addr_hi, stb_phys_addr, dev->dram_size);
dev->stb_virt_addr = devm_ioremap(dev->dev, stb_phys_addr, dev->dram_size);
+ printk(KERN_INFO "amd_pmc_s2d_init dsva: %p\n", dev->stb_virt_addr);
if (!dev->stb_virt_addr)
return -ENOMEM;
@@ -1047,6 +1052,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
int err;
u32 val;
+ printk(KERN_INFO "amd_pmc_probe: 1\n");
dev->dev = &pdev->dev;
rdev = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0)); @@ -1057,12 +1063,14 @@ static int amd_pmc_probe(struct platform_device *pdev)
dev->cpu_id = rdev->device;
+ printk(KERN_INFO "amd_pmc_probe: 2\n");
if (dev->cpu_id == AMD_CPU_ID_SP) {
dev_warn_once(dev->dev, "S0i3 is not supported on this hardware\n");
err = -ENODEV;
goto err_pci_dev_put;
}
+ printk(KERN_INFO "amd_pmc_probe: 3\n");
dev->rdev = rdev;
err = amd_smn_read(0, AMD_PMC_BASE_ADDR_LO, &val);
if (err) {
@@ -1073,6 +1081,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
base_addr_lo = val & AMD_PMC_BASE_ADDR_HI_MASK;
+ printk(KERN_INFO "amd_pmc_probe: 4\n");
err = amd_smn_read(0, AMD_PMC_BASE_ADDR_HI, &val);
if (err) {
dev_err(dev->dev, "error reading 0x%x\n", AMD_PMC_BASE_ADDR_HI); @@ -1085,6 +1094,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
dev->regbase = devm_ioremap(dev->dev, base_addr + AMD_PMC_BASE_ADDR_OFFSET,
AMD_PMC_MAPPING_SIZE);
+ printk(KERN_INFO "amd_pmc_probe: 5\n");
if (!dev->regbase) {
err = -ENOMEM;
goto err_pci_dev_put;
@@ -1095,24 +1105,31 @@ static int amd_pmc_probe(struct platform_device *pdev)
/* Get num of IP blocks within the SoC */
amd_pmc_get_ip_info(dev);
+ printk(KERN_INFO "amd_pmc_probe: 6\n");
if (enable_stb && amd_pmc_is_stb_supported(dev)) {
err = amd_pmc_s2d_init(dev);
+ printk(KERN_INFO "amd_pmc_probe: 6a\n");
if (err)
goto err_pci_dev_put;
}
+ printk(KERN_INFO "amd_pmc_probe: 7\n");
platform_set_drvdata(pdev, dev);
if (IS_ENABLED(CONFIG_SUSPEND)) {
err = acpi_register_lps0_dev(&amd_pmc_s2idle_dev_ops);
+ printk(KERN_INFO "amd_pmc_probe: 7a\n");
if (err)
dev_warn(dev->dev, "failed to register LPS0 sleep handler, expect increased power consumption\n");
if (!disable_workarounds)
amd_pmc_quirks_init(dev);
}
+ printk(KERN_INFO "amd_pmc_probe: 8\n");
amd_pmc_dbgfs_register(dev);
- if (IS_ENABLED(CONFIG_AMD_MP2_STB))
+ if (IS_ENABLED(CONFIG_AMD_MP2_STB)) {
+ printk(KERN_INFO "amd_pmc_probe: calling amd_mp2_stb_init\n");
amd_mp2_stb_init(dev);
+ }
pm_report_max_hw_sleep(U64_MAX);
return 0;
----------------------------------------------------------------------
Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 1 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 2 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 3 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 4 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 5 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6 Oct 12 00:20:01 lizard kernel: amd_pmc_is_stb_supported cpu_id: 5352 Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init size: 1048576 Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init s2d_dram_size ret: -5 Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init p_a_l: 0 p_a_hi: 0 s_p_a: 0 sz: 16777216 Oct 12 00:20:01 lizard kernel: ------------[ cut here ]------------ Oct 12 00:20:01 lizard kernel: ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff Oct 12 00:20:01 lizard kernel: WARNING: CPU: 10 PID: 2151 at arch/x86/mm/ioremap.c:217 __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: Modules linked in: amd_pmc(+) ccm cpufreq_userspace cpufreq_powersave cpufreq_conservative sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat typec_displayport amdgpu snd_sof_amd_rembrandt amdxcp drm_exec snd_sof_amd_acp gpu_sched btusb snd_sof_pci drm_buddy snd_sof_xtensa_dsp btrtl drm_suballoc_helper snd_hda_codec_realtek amd_atl drm_display_helper btintel intel_rapl_msr snd_sof btbcm intel_rapl_common snd_hda_codec_generic snd_sof_utils cec btmtk snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core uvcvideo mt7921e snd_compress videobuf2_vmalloc rc_core snd_pcm_dmaengine uvc snd_hda_intel mt7921_common drm_ttm_helper videobuf2_memops snd_pci_ps snd_intel_dspcfg snd_rpl_pci_acp6x snd_intel_sdw_acpi mt792x_lib videobuf2_v4l2 snd_pci_acp6x edac_mce_amd ttm snd_pci_acp5x mt76_connac_lib snd_hda_codec snd_rn_pci_acp3x videodev bluetooth drm_kms_helper snd_acp_config mt76 snd_hda_core videobuf2_common snd_soc_acpi i2c_algo_bit mc crc16 snd_hwdep snd_pci_acp3x amd_pmf kvm_amd amdtee mac80211 hid_sensor_als Oct 12 00:20:01 lizard kernel: hid_sensor_trigger ccp libarc4 ucs