On 09.02.2017 11:33, Samuel Pitoiset wrote: > When ttm_bo_init() fails, the reservation mutex should be unlocked. > > In debug build, the kernel reported "possible recursive locking > detected" in this codepath. For debugging purposes, I also added > a "WARN_ON(ww_mutex_is_locked())" when ttm_bo_init() fails and the > mutex was locked as expected. > > This should fix (random) GPU hangs. The easy way to reproduce the > issue is to change the "Super Sampling" option from 1.0 to 2.0 in > Hitman. It will create a huge buffer, evict a bunch of buffers > (around ~5k) and deadlock. > > This regression has been introduced pretty recently. > > v2: only release the mutex if resv is NULL > > Fixes: 12a852219583 ("drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)") > Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index d1ef1d064de4..556236a112c1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -403,8 +403,11 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev, > &bo->placement, page_align, !kernel, NULL, > acc_size, sg, resv ? resv : &bo->tbo.ttm_resv, > &amdgpu_ttm_bo_destroy); > - if (unlikely(r != 0)) > + if (unlikely(r != 0)) { > + if (!resv) > + ww_mutex_unlock(&bo->tbo.resv->lock); > return r; > + } I was looking at this myself a couple of weeks back, and I'm pretty sure I had this exact same patch just to realize that it's actually incorrect. The problem is that ttm_bo_init will actually call the destroy function (in our case, amdgpu_ttm_bo_destroy), so at this point, bo has been freed. This code is a huge mess. I'm surprised though: have you verified that this patch actually fixes a hang? Cheers, Nicolai > > bo->tbo.priority = ilog2(bo->tbo.num_pages); > if (kernel) >