[PATCH v2 3/3] drm/amdgpu: fix lock cleanup during buffer creation

michel@xxxxxxxxxxx (Michel Dänzer) · Thu, 16 Feb 2017 10:00:54 +0900

On 16/02/17 04:10 AM, Nicolai HÃ¤hnle wrote:
> From: Nicolai HÃ¤hnle <nicolai.haehnle at amd.com>
> 
> Open-code the initial ttm_bo_validate call, so that we can properly
> unlock the reservation lock when it fails. Also, properly destruct
> the reservation object when the first part of TTM BO initialization
> fails.
> 
> Actual deadlocks caused by the missing unlock should have been fixed
> by "drm/ttm: never add BO that failed to validate to the LRU list",
> superseding the flawed fix in commit 38fc4856ad98 ("drm/amdgpu: fix
> a potential deadlock in amdgpu_bo_create_restricted()").
> 
> This change fixes remaining recursive locking errors that can be seen
> with lock debugging enabled, and avoids the error of freeing a locked
> mutex.
> 
> v2: use ttm_bo_unref for error handling, and rebase on latest changes
> 
> Fixes: 12a852219583 ("drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)")
> Signed-off-by: Nicolai HÃ¤hnle <nicolai.haehnle at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index c2e57f7..15944ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -404,15 +404,31 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>  	}
>  
>  	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
> -	r = ttm_bo_init(&adev->mman.bdev, &bo->tbo, size, type,
> -			&bo->placement, page_align, !kernel, NULL,
> -			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
> -			&amdgpu_ttm_bo_destroy);
> +	r = ttm_bo_init_top(&adev->mman.bdev, &bo->tbo, size, type,
> +			    page_align, NULL,
> +			    acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
> +			    &amdgpu_ttm_bo_destroy);
> +
> +	if (likely(r == 0))
> +		r = ttm_bo_validate(&bo->tbo, &bo->placement, !kernel, false);
> +
> +	if (unlikely(r != 0)) {
> +		struct ttm_buffer_object *tbo = &bo->tbo;
> +
> +		if (!resv)
> +			ww_mutex_unlock(&bo->tbo.ttm_resv.lock);
> +		ttm_bo_unref(&tbo);
> +		return r;
> +	}

I think this would be clearer as

	if (unlikely(r != 0)) {
		struct ttm_buffer_object *tbo = &bo->tbo;

		if (!resv)
			ww_mutex_unlock(&bo->tbo.ttm_resv.lock);
		ttm_bo_unref(&tbo);
		return r;
	}

	r = ttm_bo_validate(&bo->tbo, &bo->placement, !kernel, false);
	[...]

>  	amdgpu_cs_report_moved_bytes(adev,
>  		atomic64_read(&adev->num_bytes_moved) - initial_bytes_moved);
>  
> -	if (unlikely(r != 0))
> -		return r;
> +	if (!(bo->tbo.mem.placement & TTM_PL_FLAG_NO_EVICT)) {
> +		spin_lock(&bo->tbo.glob->lru_lock);
> +		ttm_bo_add_to_lru(&bo->tbo);
> +		spin_unlock(&bo->tbo.glob->lru_lock);
> +	}

It's a bit ugly to open-code this logic in driver code. Maybe add
another TTM helper which calls ttm_bo_validate and ttm_bo_add_to_lru?

-- 
Earthling Michel DÃ¤nzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer