Re: [Patch V2] drm/amdgpu: Increase tlb flush timeout for sriov

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Am 10.08.22 um 10:50 schrieb Dusica Milinkovic:
[Why]
During multi-vf executing benchmark (Luxmark) observed kiq error timeout.
It happenes because all of VFs do the tlb invalidation at the same time.
Although each VF has the invalidate register set, from hardware side
the invalidate requests are queue to execute.

[How]
In case of 12 VF increase timeout on 12*100ms

Signed-off-by: Dusica Milinkovic <Dusica.Milinkovic@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 6 +++++-
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 6 +++++-
  2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 9ae8cdaa033e..5743975efea5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -419,6 +419,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
  	uint32_t seq;
  	uint16_t queried_pasid;
  	bool ret;
+	uint32_t sriov_usec_timeout = 1200000;  /* wait for 12 * 100ms for SRIOV */

Please put that as a define into some header and never ever write comments at the same line after a define.



  	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
  	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
@@ -437,7 +438,10 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, amdgpu_ring_commit(ring);
  		spin_unlock(&adev->gfx.kiq.ring_lock);
-		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+		if (amdgpu_sriov_vf(adev))
+			r = amdgpu_fence_wait_polling(ring, seq, sriov_usec_timeout);
+		else
+			r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);

Don't duplicate the whole call, just change the parameter.

Regards,
Christian.

  		if (r < 1) {
  			dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
  			return -ETIME;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index ab89d91975ab..bab26982b3f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -896,6 +896,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
  	uint32_t seq;
  	uint16_t queried_pasid;
  	bool ret;
+	uint32_t sriov_usec_timeout = 1200000;  /* wait for 12 * 100ms for SRIOV */
  	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
  	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
@@ -935,7 +936,10 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, amdgpu_ring_commit(ring);
  		spin_unlock(&adev->gfx.kiq.ring_lock);
-		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+		if (amdgpu_sriov_vf(adev))
+			r = amdgpu_fence_wait_polling(ring, seq, sriov_usec_timeout);
+		else
+			r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
  		if (r < 1) {
  			dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
  			up_read(&adev->reset_domain->sem);




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux