Re: [PATCH] drm/amdgpu: Increase tlb flush timeout for sriov

Christian König <christian.koenig@xxxxxxx> · Wed, 19 May 2021 13:49:04 +0200

Yeah, but you can't do that it will probably trigger the watchdog timer.

The usec_timeout is named this way because it is a usec timeout. 
Anything large than 1ms is a no-go here.

When the other instances do a FLR we don't really need to wait for the 
TLB flush anyway since any FLR will kill that.

Christian.

Am 19.05.21 um 13:08 schrieb Liu, Cheng Zhe:
[AMD Official Use Only]

We support 12 VF at most. In worst case, the first 11 all IDLE fail and do FLR, it will need 11 * 500ms to switch to the 12nd VF,
so I set 12 * 500ms  for the timeout.

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Wednesday, May 19, 2021 6:08 PM
To: Liu, Cheng Zhe <ChengZhe.Liu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Xiao, Jack <Jack.Xiao@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Wang, Kevin(Yang) <Kevin1.Wang@xxxxxxx>; Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: Increase tlb flush timeout for sriov

Am 19.05.21 um 11:32 schrieb Chengzhe Liu:
When there is 12 VF, we need to increase the timeout
NAK, 6 seconds is way to long to wait polling on a fence.

Why should an invalidation take that long? The engine are per VF just to avoid exactly that problem.

Christian.

Signed-off-by: Chengzhe Liu <ChengZhe.Liu@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 6 +++++-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 6 +++++-
   2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index f02dc904e4cf..a5f005c5d0ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -404,6 +404,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
   	uint32_t seq;
   	uint16_t queried_pasid;
   	bool ret;
+	uint32_t sriov_usec_timeout = 6000000;  /* wait for 12 * 500ms for
+SRIOV */
   	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
   
@@ -422,7 +423,10 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct
amdgpu_device *adev,
   
   		amdgpu_ring_commit(ring);
   		spin_unlock(&adev->gfx.kiq.ring_lock);
-		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+		if (amdgpu_sriov_vf(adev))
+			r = amdgpu_fence_wait_polling(ring, seq, sriov_usec_timeout);
+		else
+			r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
   		if (r < 1) {
   			dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
   			return -ETIME;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index ceb3968d8326..e4a18d8f75c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -857,6 +857,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
   	uint32_t seq;
   	uint16_t queried_pasid;
   	bool ret;
+	uint32_t sriov_usec_timeout = 6000000;  /* wait for 12 * 500ms for
+SRIOV */
   	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
   
@@ -896,7 +897,10 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct
amdgpu_device *adev,
   
   		amdgpu_ring_commit(ring);
   		spin_unlock(&adev->gfx.kiq.ring_lock);
-		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+		if (amdgpu_sriov_vf(adev))
+			r = amdgpu_fence_wait_polling(ring, seq, sriov_usec_timeout);
+		else
+			r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
   		if (r < 1) {
   			dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
   			up_read(&adev->reset_sem);

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx