>-----Original Message----- >From: Koenig, Christian >Sent: Monday, September 10, 2018 3:23 PM >To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. > >Am 10.09.2018 um 09:19 schrieb Deng, Emily: >>> -----Original Message----- >>> From: Christian König <ckoenig.leichtzumerken at gmail.com> >>> Sent: Monday, September 10, 2018 3:06 PM >>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>> >>> Am 10.09.2018 um 06:07 schrieb Emily Deng: >>>> It will ramdomly have the dead lock issue when test TDR: >>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>> amdgpu_bo_create locked the bo's resv lock 3. >>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>> lock. >>>> >>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >>> Good catch, problem is the fix doesn't work like this because the >>> lock won't be dropped at this point in most cases. >> Sorry, could you explain more, why the lock won't be dropped after calling >reservation_object_unlock? > >reservation_object_unlock won't be called for most BOs. > >E.g. the shadow is used for page directories and all page directories use the >root reservation object and so have bp->resv set here. Thanks, will modify as your suggestion. Best wishes Emily Deng >Christian. > >>> Instead you need to fix amdgpu_device_recover_vram_from_shadow to >>> make a local copy of the list. >>> >>> You can do this by grabbing a reference to each BO and moving it to a >>> local list. >>> >>> Regards, >>> Christian. >>> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++----- >>>> 1 file changed, 7 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> index de990bd..c75447d 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct >>> amdgpu_device *adev, >>>> bp.resv = bo->tbo.resv; >>>> >>>> r = amdgpu_bo_do_create(adev, &bp, &bo->shadow); >>>> - if (!r) { >>>> + if (!r) >>>> bo->shadow->parent = amdgpu_bo_ref(bo); >>>> - mutex_lock(&adev->shadow_list_lock); >>>> - list_add_tail(&bo->shadow_list, &adev->shadow_list); >>>> - mutex_unlock(&adev->shadow_list_lock); >>>> - } >>>> >>>> return r; >>>> } >>>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device >>> *adev, >>>> if (!bp->resv) >>>> reservation_object_unlock((*bo_ptr)->tbo.resv); >>>> >>>> + if (!r) { >>>> + mutex_lock(&adev->shadow_list_lock); >>>> + list_add_tail(&(*bo_ptr)->shadow_list, &adev- >>>> shadow_list); >>>> + mutex_unlock(&adev->shadow_list_lock); >>>> + } >>>> + >>>> if (r) >>>> amdgpu_bo_unref(bo_ptr); >>>> }