Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks a lot, please let me know.

Andrey

On 2021-12-24 3:58 a.m., Deng, Emily wrote:
These patches look good to me. JingWen will pull these patches and do some basic TDR test on sriov environment, and give feedback.

Best wishes
Emily Deng



-----Original Message-----
From: Liu, Monk <Monk.Liu@xxxxxxx>
Sent: Thursday, December 23, 2021 6:14 PM
To: Koenig, Christian <Christian.Koenig@xxxxxxx>; Grodzovsky, Andrey
<Andrey.Grodzovsky@xxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; amd-
gfx@xxxxxxxxxxxxxxxxxxxxx; Chen, Horace <Horace.Chen@xxxxxxx>; Chen,
JingWen <JingWen.Chen2@xxxxxxx>; Deng, Emily <Emily.Deng@xxxxxxx>
Cc: daniel@xxxxxxxx
Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

[AMD Official Use Only]

@Chen, Horace @Chen, JingWen @Deng, Emily

Please take a review on Andrey's patch

Thanks
-------------------------------------------------------------------
Monk Liu | Cloud GPU & Virtualization Solution | AMD
-------------------------------------------------------------------
we are hiring software manager for CVS core team
-------------------------------------------------------------------

-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, December 23, 2021 4:42 PM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; dri-
devel@xxxxxxxxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: daniel@xxxxxxxx; Liu, Monk <Monk.Liu@xxxxxxx>; Chen, Horace
<Horace.Chen@xxxxxxx>
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:
Since now flr work is serialized against  GPU resets there is no need
for this.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Acked-by: Christian König <christian.koenig@xxxxxxx>

---
   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 -----------
   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 -----------
   2 files changed, 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 487cd654b69e..7d59a66e3988 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct
work_struct *work)
   	struct amdgpu_device *adev = container_of(virt, struct
amdgpu_device, virt);
   	int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;

-	/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-	 * otherwise the mailbox msg will be ruined/reseted by
-	 * the VF FLR.
-	 */
-	if (!down_write_trylock(&adev->reset_sem))
-		return;
-
   	amdgpu_virt_fini_data_exchange(adev);
-	atomic_set(&adev->in_gpu_reset, 1);

   	xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct
work_struct *work)
   	} while (timeout > 1);

   flr_done:
-	atomic_set(&adev->in_gpu_reset, 0);
-	up_write(&adev->reset_sem);
-
   	/* Trigger recovery for world switch failure if no TDR */
   	if (amdgpu_device_should_recover_gpu(adev)
   		&& (!amdgpu_device_has_job_running(adev) || diff --git
a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index e3869067a31d..f82c066c8e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct
work_struct *work)
   	struct amdgpu_device *adev = container_of(virt, struct
amdgpu_device, virt);
   	int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;

-	/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-	 * otherwise the mailbox msg will be ruined/reseted by
-	 * the VF FLR.
-	 */
-	if (!down_write_trylock(&adev->reset_sem))
-		return;
-
   	amdgpu_virt_fini_data_exchange(adev);
-	atomic_set(&adev->in_gpu_reset, 1);

   	xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct
work_struct *work)
   	} while (timeout > 1);

   flr_done:
-	atomic_set(&adev->in_gpu_reset, 0);
-	up_write(&adev->reset_sem);
-
   	/* Trigger recovery for world switch failure if no TDR */
   	if (amdgpu_device_should_recover_gpu(adev)
   		&& (!amdgpu_device_has_job_running(adev) ||



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux