>-----Original Message----- >From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of >Christian König >Sent: Monday, September 10, 2018 5:49 PM >To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >Subject: Re: [PATCH v2] drm/amdgpu: Fix the dead lock issue. > >Am 10.09.2018 um 11:47 schrieb Deng, Emily: >>> -----Original Message----- >>> From: Christian König <ckoenig.leichtzumerken at gmail.com> >>> Sent: Monday, September 10, 2018 5:41 PM >>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >>> Subject: Re: [PATCH v2] drm/amdgpu: Fix the dead lock issue. >>> >>> Am 10.09.2018 um 11:34 schrieb Emily Deng: >>>> It will ramdomly have the dead lock issue when test TDR: >>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>> amdgpu_bo_create locked the bo's resv lock 3. >>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>> lock. >>>> >>>> v2: >>>> Make a local copy of the list >>>> >>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++++++++++- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + >>>> 2 files changed, 12 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index acfc63e..2b9f597 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3006,6 +3006,9 @@ static int >>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>> long r = 1; >>>> int i = 0; >>>> long tmo; >>>> + struct list_head local_shadow_list; >>>> + >>>> + INIT_LIST_HEAD(&local_shadow_list); >>>> >>>> if (amdgpu_sriov_runtime(adev)) >>>> tmo = msecs_to_jiffies(8000); >>>> @@ -3013,8 +3016,15 @@ static int >>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>> tmo = msecs_to_jiffies(100); >>>> >>>> DRM_INFO("recover vram bo from shadow start\n"); >>>> + >>>> mutex_lock(&adev->shadow_list_lock); >>>> list_for_each_entry_safe(bo, tmp, &adev->shadow_list, >>>> shadow_list) { >>>> + amdgpu_bo_ref(bo); >>>> + list_add_tail(&bo->copy_shadow_list, &local_shadow_list); >>>> + } >>> Please don't add an extra copy_shadow_list field to amdgpu_bo. >> If don't use an extra variable, the local shadow list will change >> according the adev->shadow_list, both for adding or deleting, it is not we >want. > >That is not correct, see amdgpu_bo_destroy: >>        if (!list_empty(&bo->shadow_list)) { >>                mutex_lock(&adev->shadow_list_lock); >>                list_del_init(&bo->shadow_list); >>                mutex_unlock(&adev->shadow_list_lock); >>        } > >The BO is only removed from the list when it is destroyed, since we grabbed a >local reference it can't be destroyed. So we are safe here. Sorry I am not meaning the delete, what about the adding. >Regards, >Christian. > >> >>> Instead just use bo->shadow list for this. When you hold a reference >>> to the BO it should not be removed from the shadow list. >>> >>> Additional to that you can just use list_splice_init() to move the >>> whole shadow list to your local list. >>> >>> Christian. >>> >>>> + mutex_unlock(&adev->shadow_list_lock); >>>> + >>>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, >>>> +copy_shadow_list) { >>>> next = NULL; >>>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>> &next); >>>> if (fence) { >>>> @@ -3033,8 +3043,8 @@ static int >>> amdgpu_device_handle_vram_lost(struct >>>> amdgpu_device *adev) >>>> >>>> dma_fence_put(fence); >>>> fence = next; >>>> + amdgpu_bo_unref(&bo); >>>> } >>>> - mutex_unlock(&adev->shadow_list_lock); >>>> >>>> if (fence) { >>>> r = dma_fence_wait_timeout(fence, false, tmo); diff --git >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> index 907fdf4..cfee16c 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>> @@ -103,6 +103,7 @@ struct amdgpu_bo { >>>> struct list_head mn_list; >>>> struct list_head shadow_list; >>>> }; >>>> + struct list_head copy_shadow_list; >>>> >>>> struct kgd_mem *kfd_bo; >>>> }; > >_______________________________________________ >amd-gfx mailing list >amd-gfx at lists.freedesktop.org >https://lists.freedesktop.org/mailman/listinfo/amd-gfx