On 11/16/2021 12:53 PM, Tao Zhou wrote:
If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
reset message.
Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx>
---
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 21 ++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 35145db6eedf..6f3d064a8232 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu)
int smu_v13_0_mode1_reset(struct smu_context *smu)
{
- u32 smu_version;
+ u32 smu_version, fatal_err, param;
int ret = 0;
+ struct amdgpu_device *adev = smu->adev;
+ struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
+
+ fatal_err = 0;
+ param = SMU_RESET_MODE_1;
+
/*
* PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
*/
smu_cmn_get_smc_version(smu, NULL, &smu_version);
if (smu_version < 0x00440700)
ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL);
- else
- ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
+ else {
+ /* fatal error triggered by ras, PMFW supports the flag
+ from 68.44.0 */
+ if ((smu_version >= 0x00442c00) && ras &&
+ atomic_read(&ras->in_recovery))
+ fatal_err = 1;
+
From PMFW version, this looks specific to aldebaran. Since there is
version check as well, the implementation needs to be moved to
aldebaran_ppt.c
Thanks,
Lijo
+ param |= (fatal_err << 16);
+ ret = smu_cmn_send_smc_msg_with_param(smu,
+ SMU_MSG_GfxDeviceDriverReset, param, NULL);
+ }
if (!ret)
msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);