On 12/16/2023 1:25 AM, Mario Limonciello wrote:
The SW CTF delayed work handler triggers a shutdown if a sensor
read failed for any reason.
The specific circumstance of a busy sensor should be retried
however to ensure that a good value can be returned.
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 963cf6e76935..5eb46b6bad43 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1182,6 +1182,12 @@ static void smu_swctf_delayed_work_handler(struct work_struct *work)
if (hotspot_tmp / 1000 < range->software_shutdown_temp)
return;
break;
+ case -EBUSY:
In patch 1, presently -EBUSY is returned for
1) RAS interrupt - A RAS interrupt will eventually result in a reset of
the device. All processes running on the device are going to be
suspended before that, so a reschedule here won't be necessary.
2) Only for arcturus, aldebaran and smu v13.0.6 - Aldebaran and SMU
v13.0.6 don't use SW CTF (SW CTF limit is set in aldebaran in such a way
that it won't be hit). I don't know about SW CTF usage in arcturus.
Thanks,
Lijo
+ dev_warn(adev->dev, "Unable to read hotspot sensor, retrying in %d ms\n",
+ AMDGPU_SWCTF_EXTRA_DELAY);
+ schedule_delayed_work(&smu->swctf_delayed_work,
+ msecs_to_jiffies(AMDGPU_SWCTF_EXTRA_DELAY));
+ return;
default:
dev_err(adev->dev, "Failed to read hotspot temperature: %d\n", r);
}