RE: [PATCH] Series to disallow XGMI link power down during RAS XGMI error injection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Public Use]

 

Responses inline

 

From: Chen, Guchun <Guchun.Chen@xxxxxxx>
Sent: Friday, May 8, 2020 3:11 PM
To: Clements, John <John.Clements@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: RE: [PATCH] Series to disallow XGMI link power down during RAS XGMI error injection

 

[AMD Public Use]

 

Patch 1,

1. maybe it’s better to split it into 2, one is for header file change, and another is functional patch.

2.  Can below codes be merged to one return instead? Variable ‘en’ can serve the function input in smu_send_smc_msg_with_param.

[JC] I think what you are suggesting would probably work, but my concern is that it is not immediate clear to me if bool ec will compile to 1. The spec for the SMU cmd defines 1 to enable and 0 to disable

if (en)

+                           return smu_send_smc_msg_with_param(smu,

+                                                                                       SMU_MSG_GmiPwrDnControl,

+                                                                                       1,

+                                                                                       NULL);

+

+            return smu_send_smc_msg_with_param(smu,

+                                                                         SMU_MSG_GmiPwrDnControl,

+                                                                         0,

+                                                                         NULL);

 

Patch 2: Reviewed-by: Guchun Chen guchun.chen@xxxxxxx

 

Patch 3:

1.function amdgpu_ras_error_inject_xgmi should be one static function? [JC] That makes sense, I’ll update function to be static

2. Regarding below return case, is there one dpm_allow_xgmi_power_down leak? As we have disabled xgmi link power down.

[JC] I intentionally did this, when the RAS int is triggered I don’t want to create any races conditions to access the SMU to perform functions and entering BACO reset, the XGMI per link settings will be restored anyway after the BACO reset.

[JC] That being said I understand that this looks like it could be a bug, I’ll add comments there to clarify the sequence

+            if (amdgpu_ras_intr_triggered())

+                           return ret;

 

Regards,

Guchun

 

From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Clements, John
Sent: Friday, May 8, 2020 2:51 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: [PATCH] Series to disallow XGMI link power down during RAS XGMI error injection

 

[AMD Official Use Only - Internal Distribution Only]

 

Submitting 3 patches for review:

  • Added host to SMU FW cmd to enable/disable XGMI link power down
  • Added DPM function for XGMI link power down control
  • Disable XGMI link power down prior to issuing a XGMI RAS error
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux