Am 10.09.2018 um 09:19 schrieb Deng, Emily: >> -----Original Message----- >> From: Christian König <ckoenig.leichtzumerken at gmail.com> >> Sent: Monday, September 10, 2018 3:06 PM >> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >> >> Am 10.09.2018 um 06:07 schrieb Emily Deng: >>> It will ramdomly have the dead lock issue when test TDR: >>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow >>> is waiting for the shadow_list_lock 4. >>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>> lock. >>> >>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >> Good catch, problem is the fix doesn't work like this because the lock won't be >> dropped at this point in most cases. > Sorry, could you explain more, why the lock won't be dropped after calling reservation_object_unlock? reservation_object_unlock won't be called for most BOs. E.g. the shadow is used for page directories and all page directories use the root reservation object and so have bp->resv set here. Christian. >> Instead you need to fix amdgpu_device_recover_vram_from_shadow to make >> a local copy of the list. >> >> You can do this by grabbing a reference to each BO and moving it to a local >> list. >> >> Regards, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- >>> 1 file changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> index de990bd..c75447d 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct >> amdgpu_device *adev, >>> bp.resv = bo->tbo.resv; >>> >>> r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); >>> - if (!r) { >>> + if (!r) >>> bo->shadow->parent = amdgpu_bo_ref(bo); >>> - mutex_lock(&adev->shadow_list_lock); >>> - list_add_tail(&bo->shadow_list, &adev->shadow_list); >>> - mutex_unlock(&adev->shadow_list_lock); >>> - } >>> >>> return r; >>> } >>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device >> *adev, >>> if (!bp->resv) >>> reservation_object_unlock((*bo_ptr)->tbo.resv); >>> >>> + if (!r) { >>> + mutex_lock(&adev->shadow_list_lock); >>> + list_add_tail(&(*bo_ptr)->shadow_list, &adev- >>> shadow_list); >>> + mutex_unlock(&adev->shadow_list_lock); >>> + } >>> + >>> if (r) >>> amdgpu_bo_unref(bo_ptr); >>> }