RE: [PATCH 06/10] drm/amdgpu: add condition to enable baco for xgmi/ras case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Public Use]

After thinking it a bit, I think we can just rely on PMFW version to decide to go RAS recovery or legacy fatal_error handling for the platforms that support RAS. Leveraging amdgpu_ras_enable as a temporary solution seems not necessary? Even baco ras recovery not stable, it is the same result as legacy fatal_error handling that user has to reboot the node manually. 

So the new soc reset use cases are:
XGMI (without RAS): use PSP mode1 based chain reset, 
RAS enabled (with PMFW 40.52 and onwards): use BACO based RAS recovery,
RAS enabled (with PMFW prior to 40.52): use legacy fatal_error handling.
 
Anything else?

Regards,
Hawking
-----Original Message-----
From: Le Ma <le.ma@xxxxxxx> 
Sent: 2019年11月27日 17:15
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Dennis <Dennis.Li@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Ma, Le <Le.Ma@xxxxxxx>
Subject: [PATCH 06/10] drm/amdgpu: add condition to enable baco for xgmi/ras case

Avoid to change default reset behavior for production card by checking amdgpu_ras_enable equal to 2. And only new enough smu ucode can support baco for xgmi/ras case.

Change-Id: I07c3e6862be03e068745c73db8ea71f428ecba6b
Signed-off-by: Le Ma <le.ma@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 951327f..6202333 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -577,7 +577,9 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
 			struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, 0);
 			struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
 
-			if (hive || (ras && ras->supported))
+			if ((hive || (ras && ras->supported)) &&
+			    (amdgpu_ras_enable != 2 ||
+			    adev->pm.fw_version <= 0x283400))
 				baco_reset = false;
 		}
 		break;
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux