RE: [RESEND PATCH 4/5] Subject: drm/amdgpu: Redo XGMI reset synchronization.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - Internal Distribution Only]

 

 

 

-----Original Message-----
From: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Sent: Thursday, December 12, 2019 4:39 AM
To: dri-devel@xxxxxxxxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Ma, Le <Le.Ma@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>
Subject: [RESEND PATCH 4/5] Subject: drm/amdgpu: Redo XGMI reset synchronization.

 

Use task barrier in XGMI hive to synchronize ASIC resets across devices in XGMI hive.

 

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>

---

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 42 +++++++++++++++++++++++++-----

1 file changed, 36 insertions(+), 6 deletions(-)

 

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1d19edfa..e4089a0 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

@@ -67,6 +67,7 @@

#include "amdgpu_tmz.h"

 

 #include <linux/suspend.h>

+#include <drm/task_barrier.h>

 

 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");

MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");

@@ -2663,14 +2664,43 @@ static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)  {

           struct amdgpu_device *adev =

                       container_of(__work, struct amdgpu_device, xgmi_reset_work);

+          struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, 0);

 

-           if (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)

-                       adev->asic_reset_res = (adev->in_baco == false) ?

-                                               amdgpu_device_baco_enter(adev->ddev) :

-                                               qamdgpu_device_baco_exit(adev->ddev);

-           else

-                       adev->asic_reset_res = amdgpu_asic_reset(adev);

+          /*

+          * Use task barrier to synchronize all xgmi reset works across the

+          * hive.

+          * task_barrier_enter and task_barrier_exit will block untill all the

+          * threads running the xgmi reset works reach those points. I assume

+          * guarantee of progress here for all the threads as the workqueue code

+          * creates new worker threads as needed by amount of work items in queue

+          * (see worker_thread) and also each thread sleeps in the barrir and by

+          * this yielding the CPU for other work threads to make progress.

+          */

[Le]: This comments can be adjusted since we switch to system_unbound_wq in patch #5.

+          if (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {

+

+                      if (hive)

+                                  task_barrier_enter(&hive->tb);

[Le]: The multiple hive condition can be checked only once and moved to the location right after the assignment.

+

+                      adev->asic_reset_res = amdgpu_device_baco_enter(adev->ddev);

+

+                      if (adev->asic_reset_res)

+                                  goto fail;

+

+                      if (hive)

+                                  task_barrier_exit(&hive->tb);

[Le]: Same as above.

+

+                      adev->asic_reset_res = amdgpu_device_baco_exit(adev->ddev);

+

+                      if (adev->asic_reset_res)

+                                  goto fail;

+          } else {

+                      if (hive)

+                                  task_barrier_full(&hive->tb);

[Le]: Same as above.

 

With above addressed, Reviewed-by: Le Ma <Le.Ma@xxxxxxx>

 

Regards,

Ma Le

+

+                      adev->asic_reset_res =  amdgpu_asic_reset(adev);

+          }

 

+fail:

           if (adev->asic_reset_res)

                       DRM_WARN("ASIC reset failed with error, %d for drm dev, %s",

                                    adev->asic_reset_res, adev->ddev->unique);

--

2.7.4

 

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux