Am 10.09.2018 um 11:55 schrieb Deng, Emily: >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of >> Christian König >> Sent: Monday, September 10, 2018 5:49 PM >> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >> Subject: Re: [PATCH v2] drm/amdgpu: Fix the dead lock issue. >> >> Am 10.09.2018 um 11:47 schrieb Deng, Emily: >>>> -----Original Message----- >>>> From: Christian König <ckoenig.leichtzumerken at gmail.com> >>>> Sent: Monday, September 10, 2018 5:41 PM >>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >>>> Subject: Re: [PATCH v2] drm/amdgpu: Fix the dead lock issue. >>>> >>>> Am 10.09.2018 um 11:34 schrieb Emily Deng: >>>>> It will ramdomly have the dead lock issue when test TDR: >>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>>> amdgpu_bo_create locked the bo's resv lock 3. >>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>>> lock. >>>>> >>>>> v2: >>>>> Make a local copy of the list >>>>> >>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++++++++++- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + >>>>> 2 files changed, 12 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> index acfc63e..2b9f597 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> @@ -3006,6 +3006,9 @@ static int >>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>> long r = 1; >>>>> int i = 0; >>>>> long tmo; >>>>> + struct list_head local_shadow_list; >>>>> + >>>>> + INIT_LIST_HEAD(&local_shadow_list); >>>>> >>>>> if (amdgpu_sriov_runtime(adev)) >>>>> tmo = msecs_to_jiffies(8000); >>>>> @@ -3013,8 +3016,15 @@ static int >>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>> tmo = msecs_to_jiffies(100); >>>>> >>>>> DRM_INFO("recover vram bo from shadow start\n"); >>>>> + >>>>> mutex_lock(&adev->shadow_list_lock); >>>>> list_for_each_entry_safe(bo, tmp, &adev->shadow_list, >>>>> shadow_list) { >>>>> + amdgpu_bo_ref(bo); >>>>> + list_add_tail(&bo->copy_shadow_list, &local_shadow_list); >>>>> + } >>>> Please don't add an extra copy_shadow_list field to amdgpu_bo. >>> If don't use an extra variable, the local shadow list will change >>> according the adev->shadow_list, both for adding or deleting, it is not we >> want. >> >> That is not correct, see amdgpu_bo_destroy: >>>        if (!list_empty(&bo->shadow_list)) { >>>                mutex_lock(&adev->shadow_list_lock); >>>                list_del_init(&bo->shadow_list); >>>                mutex_unlock(&adev->shadow_list_lock); >>>        } >> The BO is only removed from the list when it is destroyed, since we grabbed a >> local reference it can't be destroyed. So we are safe here. > Sorry I am not meaning the delete, what about the adding. That will still go to adev->shadow_list and not affect our local list in any way. We are not interested in any newly allocated shadow BOs, so that should be unproblematic. Regards, Christian. >> Regards, >> Christian. >> >>>> Instead just use bo->shadow list for this. When you hold a reference >>>> to the BO it should not be removed from the shadow list. >>>> >>>> Additional to that you can just use list_splice_init() to move the >>>> whole shadow list to your local list. >>>> >>>> Christian. >>>> >>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>> + >>>>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, >>>>> +copy_shadow_list) { >>>>> next = NULL; >>>>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>>> &next); >>>>> if (fence) { >>>>> @@ -3033,8 +3043,8 @@ static int >>>> amdgpu_device_handle_vram_lost(struct >>>>> amdgpu_device *adev) >>>>> >>>>> dma_fence_put(fence); >>>>> fence = next; >>>>> + amdgpu_bo_unref(&bo); >>>>> } >>>>> - mutex_unlock(&adev->shadow_list_lock); >>>>> >>>>> if (fence) { >>>>> r = dma_fence_wait_timeout(fence, false, tmo); diff --git >>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>>> index 907fdf4..cfee16c 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h >>>>> @@ -103,6 +103,7 @@ struct amdgpu_bo { >>>>> struct list_head mn_list; >>>>> struct list_head shadow_list; >>>>> }; >>>>> + struct list_head copy_shadow_list; >>>>> >>>>> struct kgd_mem *kfd_bo; >>>>> }; >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx