Re: please help with intermittent s2idle problem on AMD laptop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-10-15 07:04, Mario Limonciello wrote:
On 10/14/2024 18:31, Corey Hickey wrote:
The STB functionality issue and your suspend issue are tangential issues.

Yes, I was hoping to be able to use STB to help troubleshoot. I do not
know if that is the right approach.

I don't think it will help you in this context. Even if STB was enabled
by your BIOS you wouldn't be able to access it from Linux if the host
froze or rebooted for some reason.

Ah ok. I did not know the STB was non-persistent.

Anyhow, I reported the lack of AMD CBS to framework support and they
logged it as a feature request.

If there's any use of me testing anything for the STB further, I'm
definitely willing to try, but otherwise, I'll move on.

Something I think notable about your system is you are using two SSDs
which is (relatively) uncommon.  Have you already updated the firmware
on both SSDs to the latest?

I have not, it seems. The drives come with stock firmware:
$ sudo nvme list
Node                  Generic               SN
Model                                    Namespace
Usage                      Format           FW Rev
--------------------- --------------------- --------------------
---------------------------------------- ----------
-------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            241802800078
WD_BLACK SN770 1TB                       0x1          1.00  TB /   1.00
TB    512   B +  0 B   731100WD
/dev/nvme1n1          /dev/ng1n1            24102U800015
WD_BLACK SN770M 1TB                      0x1          1.00  TB /   1.00
TB    512   B +  0 B   731100WD

...and it seems that version 731120WD is available for each. I can
try upgrading later (one at a time, with maybe a day or so in between).

For reference:
https://community.wd.com/t/firmware-upgrade-utility-for-linux/210120/13
https://community.frame.work/t/western-digital-drive-update-guide-
without-windows-wd-dashboard/20616
https://wddashboarddownloads.wdc.com/wdDashboard/firmware/
WD_BLACK_SN770_1TB/731120WD/device_properties.xml
https://wddashboarddownloads.wdc.com/wdDashboard/firmware/
WD_BLACK_SN770M_1TB/731120WD/device_properties.xml

Before you upgrade can you please also capture 'fwupdmgr get-devices
--json' output?  If the SSD upgrade helps you I do want to flag that
this issue in amd_s2idle.py for the future for anyone else with the same
SSD + SSD F/W to tell them they should upgrade too.

I already updated the SN770 SSD last night, but it had the same firmware
as is still on the SN770M. The current output is below.

    {
      "Name" : "WD BLACK SN770 1TB",
      "DeviceId" : "3743975ad7f64f8d6575a9ae49fb3a8856fe186f",
      "InstanceIds" : [
        "NVME\\VEN_15B7&DEV_5017",
        "NVME\\VEN_15B7&DEV_5017&SUBSYS_15B75017",
        "WD_BLACK SN770 1TB"
      ],
      "Guid" : [
        "1524d43d-ed91-5130-8cb6-8b8478508bae",
        "87cfda90-ce08-52c3-9bb5-0e0718b7e57e",
        "914bfa00-b683-532c-8c3c-71a59e7ae800"
      ],
      "Serial" : "241802800078",
      "Summary" : "NVM Express solid state drive",
      "Plugin" : "nvme",
      "Protocol" : "org.nvmexpress",
      "Flags" : [
        "internal",
        "updatable",
        "require-ac",
        "registered",
        "needs-reboot",
        "usable-during-update"
      ],
      "Vendor" : "Sandisk Corp",
      "VendorId" : "NVME:0x15B7",
      "Version" : "731120WD",
      "VersionFormat" : "plain",
      "Icons" : [
        "drive-harddisk"
      ],
      "Created" : 1729019306
    },
    {
      "Name" : "WD BLACK SN770M 1TB",
      "DeviceId" : "71b677ca0f1bc2c5b804fa1d59e52064ce589293",
      "InstanceIds" : [
        "NVME\\VEN_15B7&DEV_5042",
        "NVME\\VEN_15B7&DEV_5042&SUBSYS_15B75042",
        "WD_BLACK SN770M 1TB"
      ],
      "Guid" : [
        "c3e81c2c-00bb-55d1-b384-b11e2b85146c",
        "0e7ea477-bf9e-5d83-9b17-54fe83b54e01",
        "f8a47d37-820f-5df1-a63d-0231d8c00de6"
      ],
      "Serial" : "24102U800015",
      "Summary" : "NVM Express solid state drive",
      "Plugin" : "nvme",
      "Protocol" : "org.nvmexpress",
      "Flags" : [
        "internal",
        "updatable",
        "require-ac",
        "registered",
        "needs-reboot",
        "usable-during-update"
      ],
      "Vendor" : "Sandisk Corp",
      "VendorId" : "NVME:0x15B7",
      "Version" : "731100WD",
      "VersionFormat" : "plain",
      "Icons" : [
        "drive-harddisk"
      ],
      "Created" : 1729019306
    }


This morning, I found the laptop unable to resume; this is still with
the test kernel I've been using since I first reported the issue here. I
have needed to roll back to 6.10.6-amd64 now, though, due to some
graphical issues (which I have not yet investigated and presume are
unrelated).

I did notice something else meanwhile:

$ for i in nvme0 nvme1 ; do echo "-- $i --" ; sudo nvme smart-log "/dev/$i" ; done
-- nvme0 --
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning			: 0
temperature				: 86 °F (303 K)
available_spare				: 100%
available_spare_threshold		: 10%
percentage_used				: 0%
endurance group critical warning summary: 0
Data Units Read				: 5,322,091 (2.72 TB)
Data Units Written			: 163,768 (83.85 GB)
host_read_commands			: 12,838,207
host_write_commands			: 1,674,918
controller_busy_time			: 21
power_cycles				: 157
power_on_hours				: 8
unsafe_shutdowns			: 8
media_errors				: 0
num_err_log_entries			: 0
Warning Temperature Time		: 0
Critical Composite Temperature Time	: 0
Temperature Sensor 1			: 102 °F (312 K)
Temperature Sensor 2			: 86 °F (303 K)
Thermal Management T1 Trans Count	: 0
Thermal Management T2 Trans Count	: 0
Thermal Management T1 Total Time	: 0
Thermal Management T2 Total Time	: 0
-- nvme1 --
Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning			: 0
temperature				: 86 °F (303 K)
available_spare				: 100%
available_spare_threshold		: 10%
percentage_used				: 0%
endurance group critical warning summary: 0
Data Units Read				: 3,924,594 (2.01 TB)
Data Units Written			: 3,422,763 (1.75 TB)
host_read_commands			: 15,470,118
host_write_commands			: 5,547,092
controller_busy_time			: 38
power_cycles				: 5,745
power_on_hours				: 45
unsafe_shutdowns			: 5,597
media_errors				: 0
num_err_log_entries			: 0
Warning Temperature Time		: 0
Critical Composite Temperature Time	: 0
Temperature Sensor 1			: 105 °F (314 K)
Temperature Sensor 2			: 86 °F (303 K)
Thermal Management T1 Trans Count	: 0
Thermal Management T2 Trans Count	: 0
Thermal Management T1 Total Time	: 0
Thermal Management T2 Total Time	: 0



For nvme1, the power_cycles and unsafe_shutdowns values look very fishy,
especially in comparison to nvme0. These two SSDs are new and have both
been present in the laptop since I assembled it.

I am unsure about the power_on_hours; 45 might be too high and 8 seems
too low.

The differences in reads and writes are (mostly?) explained by being in
a RAID--one drive did the initial sync to the other drive.

Unfortunately, I don't have a reading from right after installation, so
I don't know if I received a bad drive (I bought it new and the
packaging seemed intact). I also hesitate to blame the SSD for those
values--it could be a victim of system trouble, I think.

I will track these more as I go.


Thank you,
Corey




[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux