[PATCH 4/6] drm/amdgpu: adjust timeout for ib_ring_tests(v2)

ckoenig.leichtzumerken@xxxxxxxxx (Christian König) · Tue, 27 Feb 2018 11:25:40 +0100

Am 27.02.2018 um 09:47 schrieb Monk Liu:
> issue:
> sometime GFX/MM ib test hit timeout under SRIOV env, root cause
> is that engine doesn't come back soon enough so the current
> IB test considered as timed out.
>
> fix:
> for SRIOV GFX IB test wait time need to be expanded a lot during
> SRIOV runtimei mode since it couldn't really begin before GFX engine
> come back.
>
> for SRIOV MM IB test it always need more time since MM scheduling
> is not go together with GFX engine, it is controled by h/w MM
> scheduler so no matter runtime or exclusive mode MM IB test
> always need more time.
>
> v2:
> use ring type instead of idx to judge
>
> Change-Id: I0342371bc073656476ad850e1f5d9a021846dc8c
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 32 +++++++++++++++++++++++++++++++-
>   1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 7f2c314..d66171f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -322,14 +322,44 @@ int amdgpu_ib_ring_tests(struct amdgpu_device *adev)
>   {
>   	unsigned i;
>   	int r, ret = 0;
> +	long tmo_gfx, tmo_mm;
> +
> +	tmo_mm = tmo_gfx = AMDGPU_IB_TEST_TIMEOUT;
> +	if (amdgpu_sriov_vf(adev)) {
> +		/* for MM engines in hypervisor side they are not scheduled together
> +		 * with CP and SDMA engines, so even in exclusive mode MM engine could
> +		 * still running on other VF thus the IB TEST TIMEOUT for MM engines
> +		 * under SR-IOV should be set to a long time.
> +		 */
> +		tmo_mm = 8 * AMDGPU_IB_TEST_TIMEOUT; /* 8 sec should be enough for the MM comes back to this VF */

Why not adding the "8 sec should be enough for the MM comes back to this 
VF" to the comment above?

Apart from that the patch is Reviewed-by: Christian KÃ¶nig 
<christian.koenig at amd.com>

Regards,
Christian.

> +	}
> +
> +	if (amdgpu_sriov_runtime(adev)) {
> +		/* for CP & SDMA engines since they are scheduled together so
> +		 * need to make the timeout width enough to cover the time
> +		 * cost waiting for it coming back under RUNTIME only
> +		*/
> +		tmo_gfx = 8 * AMDGPU_IB_TEST_TIMEOUT;
> +	}
>   
>   	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>   		struct amdgpu_ring *ring = adev->rings[i];
> +		long tmo;
>   
>   		if (!ring || !ring->ready)
>   			continue;
>   
> -		r = amdgpu_ring_test_ib(ring, AMDGPU_IB_TEST_TIMEOUT);
> +		/* MM engine need more time */
> +		if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
> +			ring->funcs->type == AMDGPU_RING_TYPE_VCE ||
> +			ring->funcs->type == AMDGPU_RING_TYPE_UVD_ENC ||
> +			ring->funcs->type == AMDGPU_RING_TYPE_VCN_DEC ||
> +			ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC)
> +			tmo = tmo_mm;
> +		else
> +			tmo = tmo_gfx;
> +
> +		r = amdgpu_ring_test_ib(ring, tmo);
>   		if (r) {
>   			ring->ready = false;
>