[Public]
one comment belon, other than that, looks good to me. Feel free to add my RB if you send a v2.
Regards,
Rajneesh
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Thursday, October 17, 2024 5:10 AM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@xxxxxxx>; Errabolu, Ramesh <Ramesh.Errabolu@xxxxxxx>
Subject: [PATCH] drm/amdgpu: Fix the logic for NPS request failure
Sent: Thursday, October 17, 2024 5:10 AM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@xxxxxxx>; Errabolu, Ramesh <Ramesh.Errabolu@xxxxxxx>
Subject: [PATCH] drm/amdgpu: Fix the logic for NPS request failure
On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.
Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx>
Fixes: 44d5206ec07c ("drm/amdgpu: Place NPS mode request on unload")
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 27 +++++++++++++-----------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index fcdbcff57632..d2c25af2c5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -1586,26 +1586,29 @@ int amdgpu_xgmi_request_nps_change(struct amdgpu_device *adev,
* devices don't request anymore.
*/
mutex_lock(&hive->hive_lock);
+ if (atomic_read(&hive->requested_nps_mode) ==
+ UNKNOWN_MEMORY_PARTITION_MODE) {
+ mutex_unlock(&hive->hive_lock);
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.
Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx>
Fixes: 44d5206ec07c ("drm/amdgpu: Place NPS mode request on unload")
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 27 +++++++++++++-----------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index fcdbcff57632..d2c25af2c5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -1586,26 +1586,29 @@ int amdgpu_xgmi_request_nps_change(struct amdgpu_device *adev,
* devices don't request anymore.
*/
mutex_lock(&hive->hive_lock);
+ if (atomic_read(&hive->requested_nps_mode) ==
+ UNKNOWN_MEMORY_PARTITION_MODE) {
+ mutex_unlock(&hive->hive_lock);
Maybe a warning or debug print here is useful?
+ return 0;
+ }
list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
r = adev->gmc.gmc_funcs->request_mem_partition_mode(
tmp_adev, req_nps_mode);
if (r)
- goto err;
+ break;
+ }
+ if (r) {
+ /* Request back current mode if one of the requests failed */
+ cur_nps_mode =
+ adev->gmc.gmc_funcs->query_mem_partition_mode(tmp_adev);
+ list_for_each_entry_continue_reverse(
+ tmp_adev, &hive->device_list, gmc.xgmi.head)
+ adev->gmc.gmc_funcs->request_mem_partition_mode(
+ tmp_adev, cur_nps_mode);
}
/* Set to UNKNOWN so that other devices don't request anymore */
atomic_set(&hive->requested_nps_mode, UNKNOWN_MEMORY_PARTITION_MODE);
-
mutex_unlock(&hive->hive_lock);
- return 0;
-err:
- /* Request back current mode if one of the requests failed */
- cur_nps_mode = adev->gmc.gmc_funcs->query_mem_partition_mode(tmp_adev);
- list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list,
- gmc.xgmi.head)
- adev->gmc.gmc_funcs->request_mem_partition_mode(tmp_adev,
- cur_nps_mode);
- mutex_lock(&hive->hive_lock);
-
return r;
}
--
2.25.1