Re: please help with intermittent s2idle problem on AMD laptop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-10-14 09:40, Mario Limonciello wrote:
On 10/14/2024 01:58, Shyam Sundar S K wrote:


On 10/14/2024 12:17, Goswami, Sanket wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

+Shyam

-----Original Message-----
From: Corey Hickey <bugfood-ml@xxxxxxxxxx>
Sent: Sunday, October 13, 2024 3:42 AM
To: platform-driver-x86@xxxxxxxxxxxxxxx
Subject: please help with intermittent s2idle problem on AMD laptop

Hello,

I am having an intermittent problem with resuming from s2idle. There seems to be a problem with going into the s2idle state--the laptop appears suspended, but the power draw is high and laptop remains warm over time. Attempting to resume fails; I need to fully power off the laptop.

Can somebody please help me troubleshoot this? I am able to test patches and experiment, but I'm out of my depth with trying to figure this out on my own.

If there is a better place to ask this, please let me know.

I first posted about the problem here:

https://community.frame.work/t/linux-framework-16-intermittent-failure-to-resume-from-suspend/58674

System details are:
* Framework Laptop 16 (without GPU module)
* Ryzen 7 7840HS
* Debian Sid

The kernel I had trouble with was 6.10.6; I have just recently updated the kernel to git 09f6b0c8904bfaa1e0601bc102e1b6aa6de8c98f (from
yesterday) in order to try to troubleshoot further.


I tried to find some debugging information on my own. The remainder of this message is about that effort, but if I'm on the wrong track, please disregard the following.


I found this article:
https://www.phoronix.com/news/AMD-MP2-STB-Suspend-Resume
...and hoped I would be able to find some useful information.

As far as I can tell from the code, I need to load the amd_pmc module with enable_stb=1.

lizard:~# rmmod amd_pmc
lizard:~# modprobe amd_pmc enable_stb=1

If I do that, though:
1. There is an error: 'amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff'

this is expected as the command is not supported on PMFW loaded on
your system.

and..

ret=-5 is expected on your system, because it does not support EFR
(Enhanced Firmware Reporting).

2. There is a kernel WARNING (which I will paste in full below):
      ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff 3. The expected files in debugfs do not appear.


This is happening because, the ioremap() is happening for addr 0x0.
Ideally you should have got the physical address from the mailbox
command. But that does not seem to happen.

I suspect that on your system, the STB is not enabled. Can you check
the following path to see if that helps?

AMD CBS -> SMU Debug Options -> SMU Feature Config Limits -> STB To
DRAM Log <Enabled>

If DRAM log is disabled, then that should be enabled to attempt to
take a stb log.


I added some printk statements to the driver in order to try to find out what is happening.

The trouble seems to be in amd_pmc_s2d_init() and the results it gets back from calling amd_pmc_send_cmd()


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/amd/pmc/pmc.c#n978


           /* Get DRAM size */
           ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true);
           printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret);
           if (ret || !dev->dram_size)
                   dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX;

           /* Get STB DRAM address */
           amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_LOW, &phys_addr_low, dev->s2d_msg_id, true);
           amd_pmc_send_cmd(dev, S2D_PHYS_ADDR_HIGH, &phys_addr_hi, dev->s2d_msg_id, true);


For the call to retrieve S2D_DRAM_SIZE, the return value is -5.
For the calls to retrieve S2D_PHYS_ADDR_LOW, the return value is 0, but phys_addr_low is 0 as well, which seems wrong.
For S2D_PHYS_ADDR_HIGH, phys_addr_hi is 0 is well.

I think that both of the phys_addr values being 0 is resulting in the warning from ioremap.

Is this a driver bug, or a hardware limitation?

I will post my debug patch below and then the output from the kernel when loading 'amd_pmc enable_stb=1'.


----------------------------------------------------------------------
commit ed7a2784cf6a19796734b8aca87a260c4ff1f752
Author: Corey Hickey <bugfood-c@xxxxxxxxxx>
Date:   Fri 2024-10-11 23:13:40

       debug

diff --git a/drivers/platform/x86/amd/pmc/mp2_stb.c b/drivers/platform/x86/amd/pmc/mp2_stb.c
index 9775ddc1b27a..718b01266bff 100644
--- a/drivers/platform/x86/amd/pmc/mp2_stb.c
+++ b/drivers/platform/x86/amd/pmc/mp2_stb.c
@@ -228,10 +228,12 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
          struct pci_dev *pdev;
          int rc;


No need to look at mp2_stb.c as it is meant for chromebook use-cases.
So, it will not take this path on your framework system.

Note that I have looked at your debug patch, but it may not be in the
right direction.

I would suggest:
- reload the amd_pmc driver with dyndbg
- Put the system to sleep "echo mem > /sys/power/state" and take the
dmesg logs
- get the dump of /sys/kernel/debug/amd_pmc/s0ix_stats and
/sys/kernel/debug/amd_pmc/smu_fw_info

if the dmesg and debugfs logs are not helpful, then you can enable the
BIOS settings as described above to take the STB log.

The stb log can be obtained by cat /sys/kernel/debug/amd_pmc/stb_read
stb_data.bin and please put that info on bugzilla.

Thanks,
Shyam

+    printk(KERN_INFO "amd_mp2_stb_init 1\n");
          mp2 = devm_kzalloc(dev->dev, sizeof(*mp2), GFP_KERNEL);
          if (!mp2)
                  return;

+    printk(KERN_INFO "amd_mp2_stb_init 2\n");
          pdev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MP2_STB, NULL);
          if (!pdev)
                  return;
@@ -239,24 +241,28 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)
          dev->mp2 = mp2;
          mp2->pdev = pdev;

+    printk(KERN_INFO "amd_mp2_stb_init 3");
          mp2->devres_gid = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
          if (!mp2->devres_gid) {
                  dev_err(&pdev->dev, "devres_open_group failed\n");
                  goto mp2_error;
          }

+    printk(KERN_INFO "amd_mp2_stb_init 4\n");
          rc = pcim_enable_device(pdev);
          if (rc) {
                  dev_err(&pdev->dev, "pcim_enable_device failed\n");
                  goto mp2_error;
          }

+    printk(KERN_INFO "amd_mp2_stb_init 5\n");
          rc = pcim_iomap_regions(pdev, BIT(MP2_MMIO_BAR), "mp2 stb");
          if (rc) {
                  dev_err(&pdev->dev, "pcim_iomap_regions failed\n");
                  goto mp2_error;
          }

+    printk(KERN_INFO "amd_mp2_stb_init 6\n");
          mp2->mmio = pcim_iomap_table(pdev)[MP2_MMIO_BAR];
          if (!mp2->mmio) {
                  dev_err(&pdev->dev, "pcim_iomap_table failed\n"); @@ -265,6 +271,7 @@ void amd_mp2_stb_init(struct amd_pmc_dev *dev)

          pci_set_master(pdev);

+    printk(KERN_INFO "amd_mp2_stb_init 7\n");
          rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
          if (rc) {
                  dev_err(&pdev->dev, "failed to set DMA mask\n"); diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c
index bbb8edb62e00..6ca497473d78 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -627,6 +627,7 @@ static void amd_pmc_dbgfs_unregister(struct amd_pmc_dev *dev)

    static bool amd_pmc_is_stb_supported(struct amd_pmc_dev *dev)
    {
+    printk(KERN_INFO "amd_pmc_is_stb_supported cpu_id: %d\n",
+ dev->cpu_id);
          switch (dev->cpu_id) {
          case AMD_CPU_ID_YC:
          case AMD_CPU_ID_CB:
@@ -986,11 +987,13 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev)
          dev->msg_port = 1;

          amd_pmc_send_cmd(dev, S2D_TELEMETRY_SIZE, &size, dev->s2d_msg_id, true);
+       printk(KERN_INFO "amd_pmc_s2d_init size: %u\n", size);
          if (size != S2D_TELEMETRY_BYTES_MAX)
                  return -EIO;

          /* Get DRAM size */
          ret = amd_pmc_send_cmd(dev, S2D_DRAM_SIZE, &dev->dram_size, dev->s2d_msg_id, true);
+       printk(KERN_INFO "amd_pmc_s2d_init s2d_dram_size ret: %d\n", ret);
          if (ret || !dev->dram_size)
                  dev->dram_size = S2D_TELEMETRY_DRAMBYTES_MAX;

@@ -1003,7 +1006,9 @@ static int amd_pmc_s2d_init(struct amd_pmc_dev *dev)
          /* Clear msg_port for other SMU operation */
          dev->msg_port = 0;

+       printk(KERN_INFO "amd_pmc_s2d_init p_a_l: %u p_a_hi: %u s_p_a: %llu
+sz: %u\n", phys_addr_low, phys_addr_hi, stb_phys_addr, dev->dram_size);
          dev->stb_virt_addr = devm_ioremap(dev->dev, stb_phys_addr, dev->dram_size);
+       printk(KERN_INFO "amd_pmc_s2d_init dsva: %p\n", dev->stb_virt_addr);
          if (!dev->stb_virt_addr)
                  return -ENOMEM;

@@ -1047,6 +1052,7 @@ static int amd_pmc_probe(struct platform_device *pdev)
          int err;
          u32 val;

+       printk(KERN_INFO "amd_pmc_probe: 1\n");
          dev->dev = &pdev->dev;

          rdev = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0, 0)); @@ -1057,12 +1063,14 @@ static int amd_pmc_probe(struct platform_device *pdev)

          dev->cpu_id = rdev->device;

+       printk(KERN_INFO "amd_pmc_probe: 2\n");
          if (dev->cpu_id == AMD_CPU_ID_SP) {
                  dev_warn_once(dev->dev, "S0i3 is not supported on this hardware\n");
                  err = -ENODEV;
                  goto err_pci_dev_put;
          }

+       printk(KERN_INFO "amd_pmc_probe: 3\n");
          dev->rdev = rdev;
          err = amd_smn_read(0, AMD_PMC_BASE_ADDR_LO, &val);
          if (err) {
@@ -1073,6 +1081,7 @@ static int amd_pmc_probe(struct platform_device *pdev)

          base_addr_lo = val & AMD_PMC_BASE_ADDR_HI_MASK;

+       printk(KERN_INFO "amd_pmc_probe: 4\n");
          err = amd_smn_read(0, AMD_PMC_BASE_ADDR_HI, &val);
          if (err) {
                  dev_err(dev->dev, "error reading 0x%x\n", AMD_PMC_BASE_ADDR_HI); @@ -1085,6 +1094,7 @@ static int amd_pmc_probe(struct platform_device *pdev)

          dev->regbase = devm_ioremap(dev->dev, base_addr + AMD_PMC_BASE_ADDR_OFFSET,
                                      AMD_PMC_MAPPING_SIZE);
+       printk(KERN_INFO "amd_pmc_probe: 5\n");
          if (!dev->regbase) {
                  err = -ENOMEM;
                  goto err_pci_dev_put;
@@ -1095,24 +1105,31 @@ static int amd_pmc_probe(struct platform_device *pdev)
          /* Get num of IP blocks within the SoC */
          amd_pmc_get_ip_info(dev);

+       printk(KERN_INFO "amd_pmc_probe: 6\n");
          if (enable_stb && amd_pmc_is_stb_supported(dev)) {
                  err = amd_pmc_s2d_init(dev);
+               printk(KERN_INFO "amd_pmc_probe: 6a\n");
                  if (err)
                          goto err_pci_dev_put;
          }

+       printk(KERN_INFO "amd_pmc_probe: 7\n");
          platform_set_drvdata(pdev, dev);
          if (IS_ENABLED(CONFIG_SUSPEND)) {
                  err = acpi_register_lps0_dev(&amd_pmc_s2idle_dev_ops);
+               printk(KERN_INFO "amd_pmc_probe: 7a\n");
                  if (err)
                          dev_warn(dev->dev, "failed to register LPS0 sleep handler, expect increased power consumption\n");
                  if (!disable_workarounds)
                          amd_pmc_quirks_init(dev);
          }

+       printk(KERN_INFO "amd_pmc_probe: 8\n");
          amd_pmc_dbgfs_register(dev);
-       if (IS_ENABLED(CONFIG_AMD_MP2_STB))
+       if (IS_ENABLED(CONFIG_AMD_MP2_STB)) {
+               printk(KERN_INFO "amd_pmc_probe: calling amd_mp2_stb_init\n");
                  amd_mp2_stb_init(dev);
+    }
          pm_report_max_hw_sleep(U64_MAX);
          return 0;

----------------------------------------------------------------------

Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 1 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 2 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 3 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 4 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 5 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6 Oct 12 00:20:01 lizard kernel: amd_pmc_is_stb_supported cpu_id: 5352 Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init size: 1048576 Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: SMU cmd failed. err: 0xff Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init s2d_dram_size ret: -5 Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init p_a_l: 0 p_a_hi: 0 s_p_a: 0 sz: 16777216 Oct 12 00:20:01 lizard kernel: ------------[ cut here ]------------ Oct 12 00:20:01 lizard kernel: ioremap on RAM at 0x0000000000000000 - 0x0000000000ffffff Oct 12 00:20:01 lizard kernel: WARNING: CPU: 10 PID: 2151 at arch/x86/mm/ioremap.c:217 __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: Modules linked in: amd_pmc(+) ccm cpufreq_userspace cpufreq_powersave cpufreq_conservative sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat typec_displayport amdgpu snd_sof_amd_rembrandt amdxcp drm_exec snd_sof_amd_acp gpu_sched btusb snd_sof_pci drm_buddy snd_sof_xtensa_dsp btrtl drm_suballoc_helper snd_hda_codec_realtek amd_atl drm_display_helper btintel intel_rapl_msr snd_sof btbcm intel_rapl_common snd_hda_codec_generic snd_sof_utils cec btmtk snd_hda_scodec_component snd_hda_codec_hdmi snd_soc_core uvcvideo mt7921e snd_compress videobuf2_vmalloc rc_core snd_pcm_dmaengine uvc snd_hda_intel mt7921_common drm_ttm_helper videobuf2_memops snd_pci_ps snd_intel_dspcfg snd_rpl_pci_acp6x snd_intel_sdw_acpi mt792x_lib videobuf2_v4l2 snd_pci_acp6x edac_mce_amd ttm snd_pci_acp5x mt76_connac_lib snd_hda_codec snd_rn_pci_acp3x videodev bluetooth drm_kms_helper snd_acp_config mt76 snd_hda_core videobuf2_common snd_soc_acpi i2c_algo_bit mc crc16 snd_hwdep snd_pci_acp3x amd_pmf kvm_amd amdtee mac80211 hid_sensor_als Oct 12 00:20:01 lizard kernel:  hid_sensor_trigger ccp libarc4 uc
si_acpi hid_sensor_iio_common kvm industrialio_triggered_buffer amd_sfh typec_ucsi kfifo_buf leds_cros_ec cros_usbpd_charger tee typec snd_pcsp cros_ec_hwmon platform_profile led_class_multicolor rapl cros_usbpd_notify cfg80211 cros_ec_sysfs industrialio roles cros_usbpd_logger cros_ec_debugfs cros_charge_control cros_ec_chardev wmi_bmof sp5100_tco button ac k10temp watchdog rfkill cpufreq_ondemand snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore evdev i2c_dev sidewinder gameport joydev parport_pc ppdev lp parport efi_pstore configfs nfnetlink ip_tables x_tables autofs4 xfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 cdc_ncm cdc_ether usbnet r8152 mii libphy usbhid raid1 hid_multitouch hid_sensor_hub hid_generic md_mod crct10dif_pclmul i2c_hid_acpi crc32_pclmul xhci_pci i2c_hid crc32c_intel cros_ec_dev xhci_hcd ghash_clmulni_intel Oct 12 00:20:01 lizard kernel:  cros_ec_lpcs sha512_ssse3 cros_ec nvme sha256_ssse3 usbcore drm thunderbolt sha1_ssse3 i2c_piix4 video nvme_core i2c_smbus usb_common battery wmi hid aesni_intel gf128mul crypto_simd cryptd [last unloaded: amd_pmc] Oct 12 00:20:01 lizard kernel: CPU: 10 UID: 0 PID: 2151 Comm: modprobe Not tainted 6.12.0-rc2+ #8 Oct 12 00:20:01 lizard kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.03 03/27/2024 Oct 12 00:20:01 lizard kernel: RIP: 0010:__ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel: Code: 1f fe ff ff 80 3d 7e 33 d8 01 00 75 9d 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 9f ae 6c 82 c6 05 64 33 d8 01 01 e8 53 d8 01 00 <0f> 0b e9 79 ff ff ff 83 fd 04 75 35 bf 04 00 00 00 e8 ad a0 ff ff Oct 12 00:20:01 lizard kernel: RSP: 0018:ffffaee502663a18 EFLAGS: 00010282 Oct 12 00:20:01 lizard kernel: RAX: 0000000000000000 RBX: ffff8fe3066563e8 RCX: 0000000000000027 Oct 12 00:20:01 lizard kernel: RDX: ffff8ff1dff21788 RSI: 0000000000000001 RDI: ffff8ff1dff21780 Oct 12 00:20:01 lizar
d kernel: RBP: 0000000000000002 R08: 0000000000000000 R09: ffffaee502663898 Oct 12 00:20:01 lizard kernel: R10: ffffffff82eb40e8 R11: 0000000000000003 R12: 0000000001000000 Oct 12 00:20:01 lizard kernel: R13: 0000000001000000 R14: 0000000000000000 R15: 0000000000000000 Oct 12 00:20:01 lizard kernel: FS:  00007f5e2c747640(0000) GS:ffff8ff1dff00000(0000) knlGS:0000000000000000 Oct 12 00:20:01 lizard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 12 00:20:01 lizard kernel: CR2: 00007f5e2bf7e700 CR3: 000000010d2a2000 CR4: 0000000000750ef0 Oct 12 00:20:01 lizard kernel: PKRU: 55555554 Oct 12 00:20:01 lizard kernel: Call Trace:
Oct 12 00:20:01 lizard kernel:  <TASK>
Oct 12 00:20:01 lizard kernel:  ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel:  ? __warn.cold+0x93/0xf6 Oct 12 00:20:01 lizard kernel:  ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel:  ? report_bug+0xff/0x140 Oct 12 00:20:01 lizard kernel:  ? console_unlock+0x9d/0x140 Oct 12 00:20:01 lizard kernel:  ? handle_bug+0x58/0x90 Oct 12 00:20:01 lizard kernel:  ? exc_invalid_op+0x17/0x70 Oct 12 00:20:01 lizard kernel:  ? asm_exc_invalid_op+0x1a/0x20 Oct 12 00:20:01 lizard kernel:  ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel:  ? __ioremap_caller+0x2cd/0x340 Oct 12 00:20:01 lizard kernel:  ? devm_ioremap+0x49/0x80 Oct 12 00:20:01 lizard kernel:  ? __pfx_devm_ioremap_release+0x10/0x10
Oct 12 00:20:01 lizard kernel:  devm_ioremap+0x49/0x80 Oct 12 00:20:01 lizard kernel:  amd_pmc_probe+0x41a/0x5ac [amd_pmc] Oct 12 00:20:01 lizard kernel:  platform_probe+0x41/0xa0 Oct 12 00:20:01 lizard kernel:  really_probe+0xdb/0x340 Oct 12 00:20:01 lizard kernel:  ? pm_runtime_barrier+0x54/0x90 Oct 12 00:20:01 lizard kernel:  ? __pfx___driver_attach+0x10/0x10 Oct 12 00:20:01 lizard kernel:  __driver_probe_device+0x78/0x110 Oct 12 00:20:01 lizard kernel:  driver_probe_device+0x1f/0xa0 Oct 12 00:20:01 lizard kernel:  __driver_attach+0xba/0x1c0 Oct 12 00:20:01 lizard kernel:  bus_for_each_dev+0x8c/0xe0 Oct 12 00:20:01 lizard kernel:  bus_add_driver+0x112/0x1f0 Oct 12 00:20:01 lizard kernel:  driver_register+0x72/0xd0 Oct 12 00:20:01 lizard kernel:  ? __pfx_amd_pmc_driver_init+0x10/0x10 [amd_pmc] Oct 12 00:20:01 lizard kernel:  do_one_initcall+0x58/0x310 Oct 12 00:20:01 lizard kernel:  do_init_module+0x60/0x230 Oct 12 00:20:01 lizard kernel:  init_module_from_file+0x86/0xc0 Oct 12 00:20:01 lizard kernel:  idempotent_init_module+0x11e/0x310
Oct 12 00:20:01 lizard kernel:  __x64_sys_finit_module+0x5e/0xb0 Oct 12 00:20:01 lizard kernel:  do_syscall_64+0x82/0x190 Oct 12 00:20:01 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel:  ? __count_memcg_events+0x53/0xf0 Oct 12 00:20:01 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Oct 12 00:20:01 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel:  ? handle_mm_fault+0x1bb/0x2c0 Oct 12 00:20:01 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel:  ? do_user_addr_fault+0x36c/0x620 Oct 12 00:20:01 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Oct 12 00:20:01 lizard kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 12 00:20:01 lizard kernel: RIP: 0033:0x7f5e2bf1b0e9 Oct 12 00:20:01 lizard kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 5c 0d 00 f7 d8 64 89 01 48 Oct 12 00:20:01 lizard kernel: RSP: 002b:00007ffdc1020768 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Oct 12 00:20:01 lizard kernel: RAX: ffffffffffffffda RBX: 000055bc6cec2e70 RCX: 00007f5e2bf1b0e9 Oct 12 00:20:01 lizard kernel: RDX: 0000000000000000 RSI: 000055bc6cec3220 RDI: 0000000000000003 Oct 12 00:20:01 lizard kernel: RBP: 0000000000000000 R08: 00007f5e2bff1b20 R09: 0000000000000000 Oct 12 00:20:01 lizard kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 000055bc6cec3220 Oct 12 00:20:01 lizard kernel: R13: 0000000000040000 R14: 000055bc6cec2f10 R15: 0000000000000000 Oct 12 00:20:01 lizard kernel:  </TASK> Oct 12 00:20:01 lizard kernel: ---[ end trace 0000000000000000 ]--- Oct 12 00:20:01 lizard kernel: amd_pmc_s2d_init dsva: 0000000000000000 Oct 12 00:20:01 lizard kernel: amd_pmc_probe: 6a Oct 12 00:20:01 lizard kernel: amd_pmc AMDI0009:00: probe with driver amd_pmc failed with error -12
----------------------------------------------------------------------


Thank you,
Corey


The STB functionality issue and your suspend issue are tangential issues.

Yes, I was hoping to be able to use STB to help troubleshoot. I do not
know if that is the right approach.

You mentioned in the linked post that you didn't find any issues
reported from amd_s2idle.py [1] and also can't trigger this issue at
will.  Could you post your report generated by that script to a gist or
somewhere non-ephemeral?

Yes, I did a 10-cycle run today and posted that here:

https://fatooh.org/bugreports/2024-10-14-s2idle/s2idle_report-2024-10-14.txt

I also included the output of 'journalctl -b'.

https://fatooh.org/bugreports/2024-10-14-s2idle/journalctl-b

One thing I _have_ recently seen reproduced with amd_s2idle.py is that
the laptop sometimes ends up rebooting instead of automatically
resuming. I don't know if this is related; I mention it now just in
case. I saw this with 6.10.6 a few days ago and again with the test
kernel as originally reported (git 09f6b0c8904bf plus my debug patch).


I case they are useful, I posted the log from that run as well as
the output of 'journalctl -b -1'. There's probably not much to see,
though--the logs cut off, as expected.

https://fatooh.org/bugreports/2024-10-14-s2idle/s2idle_report-2024-10-14.txt.rebooted
https://fatooh.org/bugreports/2024-10-14-s2idle/journalctl-b-1


Something I think notable about your system is you are using two SSDs
which is (relatively) uncommon.  Have you already updated the firmware
on both SSDs to the latest?

I have not, it seems. The drives come with stock firmware:
$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            241802800078         WD_BLACK SN770 1TB                       0x1          1.00  TB /   1.00  TB    512   B +  0 B   731100WD
/dev/nvme1n1          /dev/ng1n1            24102U800015         WD_BLACK SN770M 1TB                      0x1          1.00  TB /   1.00  TB    512   B +  0 B   731100WD

...and it seems that version 731120WD is available for each. I can
try upgrading later (one at a time, with maybe a day or so in between).

For reference:
https://community.wd.com/t/firmware-upgrade-utility-for-linux/210120/13
https://community.frame.work/t/western-digital-drive-update-guide-without-windows-wd-dashboard/20616
https://wddashboarddownloads.wdc.com/wdDashboard/firmware/WD_BLACK_SN770_1TB/731120WD/device_properties.xml
https://wddashboarddownloads.wdc.com/wdDashboard/firmware/WD_BLACK_SN770M_1TB/731120WD/device_properties.xml

If so; would it be possible try to run with just one SSD for a week or
so and see if this issue comes back?  If it doesn't come back there
could be a BIOS bug with how it's handling your combination of the 2x
SSDs and you should report it to Framework.

I'm running an MD RAID, so yes, I can try removing a drive for a while.
I'll try that if I still have trouble after the SSD firmware update.
The rarity of the problems (so far) means it will probably take some
weeks before I have useful information. I'll keep trying.

Thank you for your help so far.

-Corey




[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux