On 10/14/2024 18:31, Corey Hickey wrote:
The STB functionality issue and your suspend issue are tangential issues.
Yes, I was hoping to be able to use STB to help troubleshoot. I do not
know if that is the right approach.
I don't think it will help you in this context. Even if STB was enabled
by your BIOS you wouldn't be able to access it from Linux if the host
froze or rebooted for some reason.
You mentioned in the linked post that you didn't find any issues
reported from amd_s2idle.py [1] and also can't trigger this issue at
will. Could you post your report generated by that script to a gist or
somewhere non-ephemeral?
Yes, I did a 10-cycle run today and posted that here:
https://fatooh.org/bugreports/2024-10-14-s2idle/
s2idle_report-2024-10-14.txt
Yeah nothing particularly stands out here.
I also included the output of 'journalctl -b'.
https://fatooh.org/bugreports/2024-10-14-s2idle/journalctl-b
One thing I _have_ recently seen reproduced with amd_s2idle.py is that
the laptop sometimes ends up rebooting instead of automatically
resuming. I don't know if this is related; I mention it now just in
case. I saw this with 6.10.6 a few days ago and again with the test
kernel as originally reported (git 09f6b0c8904bf plus my debug patch).
I case they are useful, I posted the log from that run as well as
the output of 'journalctl -b -1'. There's probably not much to see,
though--the logs cut off, as expected.
https://fatooh.org/bugreports/2024-10-14-s2idle/
s2idle_report-2024-10-14.txt.rebooted
https://fatooh.org/bugreports/2024-10-14-s2idle/journalctl-b-1
Something I think notable about your system is you are using two SSDs
which is (relatively) uncommon. Have you already updated the firmware
on both SSDs to the latest?
I have not, it seems. The drives come with stock firmware:
$ sudo nvme list
Node Generic SN
Model Namespace
Usage Format FW Rev
--------------------- --------------------- --------------------
---------------------------------------- ----------
-------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 241802800078
WD_BLACK SN770 1TB 0x1 1.00 TB / 1.00
TB 512 B + 0 B 731100WD
/dev/nvme1n1 /dev/ng1n1 24102U800015
WD_BLACK SN770M 1TB 0x1 1.00 TB / 1.00
TB 512 B + 0 B 731100WD
...and it seems that version 731120WD is available for each. I can
try upgrading later (one at a time, with maybe a day or so in between).
For reference:
https://community.wd.com/t/firmware-upgrade-utility-for-linux/210120/13
https://community.frame.work/t/western-digital-drive-update-guide-
without-windows-wd-dashboard/20616
https://wddashboarddownloads.wdc.com/wdDashboard/firmware/
WD_BLACK_SN770_1TB/731120WD/device_properties.xml
https://wddashboarddownloads.wdc.com/wdDashboard/firmware/
WD_BLACK_SN770M_1TB/731120WD/device_properties.xml
Before you upgrade can you please also capture 'fwupdmgr get-devices
--json' output? If the SSD upgrade helps you I do want to flag that
this issue in amd_s2idle.py for the future for anyone else with the same
SSD + SSD F/W to tell them they should upgrade too.
If so; would it be possible try to run with just one SSD for a week or
so and see if this issue comes back? If it doesn't come back there
could be a BIOS bug with how it's handling your combination of the 2x
SSDs and you should report it to Framework.
I'm running an MD RAID, so yes, I can try removing a drive for a while.
I'll try that if I still have trouble after the SSD firmware update.
The rarity of the problems (so far) means it will probably take some
weeks before I have useful information. I'll keep trying.
OK.
Thank you for your help so far.
Sure.