[PATCH 2/4] drm/amdgpu: workaround for VM fault caused by SDMA set_wptr

ckoenig.leichtzumerken@xxxxxxxxx (Christian König) · Fri, 13 Oct 2017 11:17:55 +0200

Am 13.10.2017 um 10:26 schrieb Pixel Ding:
> From: pding <Pixel.Ding at amd.com>
>
> The polling memory was standalone in VRAM before, so the HDP flush
> introduced latency that hides a VM fault issue. Now polling memory
> leverages the WB in system memory and HDP flush is not required, the
> VM fault at same page happens.
>
> Add delay back to workaround until the root cause is found.
>
> Tests: VP1 or launch 40 instances of glxinfo at the same time.
>
> Signed-off-by: pding <Pixel.Ding at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index b1de44f..5c4bbe1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -381,6 +381,9 @@ static void sdma_v3_0_ring_set_wptr(struct amdgpu_ring *ring)
>   	if (ring->use_doorbell) {
>   		/* XXX check if swapping is necessary on BE */
>   		adev->wb.wb[ring->wptr_offs] = lower_32_bits(ring->wptr) << 2;
> +		/* workaround: VM fault always happen at page 2046 */
> +		if (amdgpu_sriov_vf(adev))
> +			udelay(500);

Have you tried using a memory barrier here?

That looks like it will have massive impact on performance.

Regards,
Christian.

>   		WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr) << 2);
>   	} else {
>   		int me = (ring == &ring->adev->sdma.instance[0].ring) ? 0 : 1;