On Wed, Aug 10, 2022 at 12:52 PM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > > > Am 10.08.22 um 10:50 schrieb Dusica Milinkovic: > > [Why] > > During multi-vf executing benchmark (Luxmark) observed kiq error timeout. > > It happenes because all of VFs do the tlb invalidation at the same time. > > Although each VF has the invalidate register set, from hardware side > > the invalidate requests are queue to execute. > > > > [How] > > In case of 12 VF increase timeout on 12*100ms > > > > Signed-off-by: Dusica Milinkovic <Dusica.Milinkovic@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 6 +++++- > > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 +++++- > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > index 9ae8cdaa033e..5743975efea5 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > @@ -419,6 +419,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, > > uint32_t seq; > > uint16_t queried_pasid; > > bool ret; > > + uint32_t sriov_usec_timeout = 1200000; /* wait for 12 * 100ms for SRIOV */ > > Please put that as a define into some header and never ever write > comments at the same line after a define. > > > > > struct amdgpu_ring *ring = &adev->gfx.kiq.ring; > > struct amdgpu_kiq *kiq = &adev->gfx.kiq; > > > > @@ -437,7 +438,10 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, > > > > amdgpu_ring_commit(ring); > > spin_unlock(&adev->gfx.kiq.ring_lock); > > - r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout); > > + if (amdgpu_sriov_vf(adev)) > > + r = amdgpu_fence_wait_polling(ring, seq, sriov_usec_timeout); > > + else > > + r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout); > > Don't duplicate the whole call, just change the parameter. Per this, see my comment in the previous version of this patch. Alex > > Regards, > Christian. > > > if (r < 1) { > > dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r); > > return -ETIME; > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > index ab89d91975ab..bab26982b3f9 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > @@ -896,6 +896,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, > > uint32_t seq; > > uint16_t queried_pasid; > > bool ret; > > + uint32_t sriov_usec_timeout = 1200000; /* wait for 12 * 100ms for SRIOV */ > > struct amdgpu_ring *ring = &adev->gfx.kiq.ring; > > struct amdgpu_kiq *kiq = &adev->gfx.kiq; > > > > @@ -935,7 +936,10 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, > > > > amdgpu_ring_commit(ring); > > spin_unlock(&adev->gfx.kiq.ring_lock); > > - r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout); > > + if (amdgpu_sriov_vf(adev)) > > + r = amdgpu_fence_wait_polling(ring, seq, sriov_usec_timeout); > > + else > > + r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout); > > if (r < 1) { > > dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r); > > up_read(&adev->reset_domain->sem); >