>-----Original Message----- >From: Zhou, David(ChunMing) >Sent: Tuesday, September 11, 2018 11:03 AM >To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. > > > >On 2018å¹´09æ??11æ?¥ 10:51, Emily Deng wrote: >> It will ramdomly have the dead lock issue when test TDR: >> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow >> is waiting for the shadow_list_lock 4. >> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >> lock. >> >> v2: >> Make a local copy of the list >> >> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >++++++++++++++++++++- >> 1 file changed, 20 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 2a21267..8c81404 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3105,6 +3105,9 @@ static int >amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >> long r = 1; >> int i = 0; >> long tmo; >> + struct list_head local_shadow_list; >> + >> + INIT_LIST_HEAD(&local_shadow_list); >> >> if (amdgpu_sriov_runtime(adev)) >> tmo = msecs_to_jiffies(8000); >> @@ -3112,8 +3115,19 @@ static int >amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >> tmo = msecs_to_jiffies(100); >> >> DRM_INFO("recover vram bo from shadow start\n"); >> + >> + mutex_lock(&adev->shadow_list_lock); >> + list_splice_init(&adev->shadow_list, &local_shadow_list); >> + mutex_unlock(&adev->shadow_list_lock); >> + >> + >> mutex_lock(&adev->shadow_list_lock); >local_shadow_list is a local variable, I think it doesn't need lock at all, no one >change it. Otherwise looks good to me. The bo->shadow_list which now is in local_shadow_list maybe destroy in case that it already in amdgpu_bo_destroy, then it will change local_shadow_list, so need lock the shadow_list_lock. Best wishes Emily Deng >Thanks, >David Zhou >> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) { >> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { >> + mutex_unlock(&adev->shadow_list_lock); >> + >> + if (!bo) >> + continue; >> + >> next = NULL; >> amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >&next); >> if (fence) { >> @@ -3132,9 +3146,14 @@ static int >> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >> >> dma_fence_put(fence); >> fence = next; >> + mutex_lock(&adev->shadow_list_lock); >> } >> mutex_unlock(&adev->shadow_list_lock); >> >> + mutex_lock(&adev->shadow_list_lock); >> + list_splice_init(&local_shadow_list, &adev->shadow_list); >> + mutex_unlock(&adev->shadow_list_lock); >> + >> if (fence) { >> r = dma_fence_wait_timeout(fence, false, tmo); >> if (r == 0)