RE: [PATCH] drm/amdgpu: fix and cleanup amdgpu_gem_object_close

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>> Now what happens is that clearing the VM page tables runs asynchronously to the exclusive fence which moves the buffer around.

The amdgpu_vm_clear_freed  is already kicked off before you wait for the exclusive fence signaled,  why you can avoid clearing PT not overlap with the "move" action after you introduce a " dma_resv_wait_timeout_rcu"
*after* the amdgpu_vm_clear_freed () ?
_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> 
Sent: Thursday, March 12, 2020 8:50 PM
To: Liu, Monk <Monk.Liu@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdgpu: fix and cleanup amdgpu_gem_object_close

 From the semantic the dma_resv object contains a single exclusive and multiple shared fences and it is mandatory that the shared fences complete after the exclusive one.

Now what happens is that clearing the VM page tables runs asynchronously to the exclusive fence which moves the buffer around.

And because of this Xinhui ran into a rare problem that TTM thought it could reuse the memory while in reality it was still in use.

Regards,
Christian.

Am 12.03.20 um 12:30 schrieb Liu, Monk:
> Can you give more details about " we can't guarantee the the clear fence will complete after the exclusive one." ?
>
> Thanks
>
> _____________________________________
> Monk Liu|GPU Virtualization Team |AMD
>
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of 
> Christian K?nig
> Sent: Thursday, March 12, 2020 7:12 PM
> To: Pan, Xinhui <Xinhui.Pan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: [PATCH] drm/amdgpu: fix and cleanup amdgpu_gem_object_close
>
> The problem is that we can't add the clear fence to the BO when there is an exclusive fence on it since we can't guarantee the the clear fence will complete after the exclusive one.
>
> To fix this refactor the function and wait for any potential exclusive fence with a small timeout before adding the shared clear fence.
>
> Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 41 +++++++++++++++----------
>   1 file changed, 25 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 4277125a79ee..0782162fb5f3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -161,10 +161,11 @@ void amdgpu_gem_object_close(struct 
> drm_gem_object *obj,
>   
>   	struct amdgpu_bo_list_entry vm_pd;
>   	struct list_head list, duplicates;
> +	struct dma_fence *fence = NULL;
>   	struct ttm_validate_buffer tv;
>   	struct ww_acquire_ctx ticket;
>   	struct amdgpu_bo_va *bo_va;
> -	int r;
> +	long r;
>   
>   	INIT_LIST_HEAD(&list);
>   	INIT_LIST_HEAD(&duplicates);
> @@ -182,24 +183,32 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
>   		return;
>   	}
>   	bo_va = amdgpu_vm_bo_find(vm, bo);
> -	if (bo_va && --bo_va->ref_count == 0) {
> -		amdgpu_vm_bo_rmv(adev, bo_va);
> +	if (!bo_va || --bo_va->ref_count)
> +		goto out_unlock;
>   
> -		if (amdgpu_vm_ready(vm)) {
> -			struct dma_fence *fence = NULL;
> +	amdgpu_vm_bo_rmv(adev, bo_va);
> +	if (!amdgpu_vm_ready(vm))
> +		goto out_unlock;
>   
> -			r = amdgpu_vm_clear_freed(adev, vm, &fence);
> -			if (unlikely(r)) {
> -				dev_err(adev->dev, "failed to clear page "
> -					"tables on GEM object close (%d)\n", r);
> -			}
>   
> -			if (fence) {
> -				amdgpu_bo_fence(bo, fence, true);
> -				dma_fence_put(fence);
> -			}
> -		}
> -	}
> +	r = amdgpu_vm_clear_freed(adev, vm, &fence);
> +	if (r || !fence)
> +		goto out_unlock;
> +
> +	r = dma_resv_wait_timeout_rcu(bo->tbo.base.resv, false, false,
> +				      msecs_to_jiffies(10));
> +	if (r == 0)
> +		r = -ETIMEDOUT;
> +	if (r)
> +		goto out_unlock;
> +
> +	amdgpu_bo_fence(bo, fence, true);
> +	dma_fence_put(fence);
> +
> +out_unlock:
> +	if (unlikely(r))
> +		dev_err(adev->dev, "failed to clear page "
> +			"tables on GEM object close (%d)\n", r);
>   	ttm_eu_backoff_reservation(&ticket, &list);  }
>   
> --
> 2.17.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7CMo
> nk.Liu%40amd.com%7C26730e56c5b944f8cbb408d7c683d4a1%7C3dd8961fe4884e60
> 8e11a82d994e183d%7C0%7C0%7C637196141815929915&amp;sdata=yP5Yc1BWYWS93f
> 0hHERUfECmShwyQ5fbMkhCeG82n6M%3D&amp;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux