On 2018å¹´09æ??11æ?¥ 11:37, zhoucm1 wrote: > > > On 2018å¹´09æ??11æ?¥ 11:32, Deng, Emily wrote: >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of >>> zhoucm1 >>> Sent: Tuesday, September 11, 2018 11:28 AM >>> To: Deng, Emily <Emily.Deng at amd.com>; Zhou, David(ChunMing) >>> <David1.Zhou at amd.com>; amd-gfx at lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>> >>> >>> >>> On 2018å¹´09æ??11æ?¥ 11:23, Deng, Emily wrote: >>>>> -----Original Message----- >>>>> From: Zhou, David(ChunMing) >>>>> Sent: Tuesday, September 11, 2018 11:03 AM >>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. >>>>> >>>>> >>>>> >>>>> On 2018å¹´09æ??11æ?¥ 10:51, Emily Deng wrote: >>>>>> It will ramdomly have the dead lock issue when test TDR: >>>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. >>>>>> amdgpu_bo_create locked the bo's resv lock 3. >>>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. >>>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv >>>>>> lock. >>>>>> >>>>>> v2: >>>>>> Â Â Â Â Â Make a local copy of the list >>>>>> >>>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >>>>>> --- >>>>>> Â Â Â drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 >>>>> ++++++++++++++++++++- >>>>>> Â Â Â 1 file changed, 20 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> index 2a21267..8c81404 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> @@ -3105,6 +3105,9 @@ static int >>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>> Â Â Â Â Â Â Â long r = 1; >>>>>> Â Â Â Â Â Â Â int i = 0; >>>>>> Â Â Â Â Â Â Â long tmo; >>>>>> +Â Â Â struct list_head local_shadow_list; >>>>>> + >>>>>> +Â Â Â INIT_LIST_HEAD(&local_shadow_list); >>>>>> >>>>>> Â Â Â Â Â Â Â if (amdgpu_sriov_runtime(adev)) >>>>>> Â Â Â Â Â Â Â Â Â Â Â tmo = msecs_to_jiffies(8000); >>>>>> @@ -3112,8 +3115,19 @@ static int >>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>> Â Â Â Â Â Â Â Â Â Â Â tmo = msecs_to_jiffies(100); >>>>>> >>>>>> Â Â Â Â Â Â Â DRM_INFO("recover vram bo from shadow start\n"); >>>>>> + >>>>>> +Â Â Â mutex_lock(&adev->shadow_list_lock); >>>>>> +Â Â Â list_splice_init(&adev->shadow_list, &local_shadow_list); >>>>>> +Â Â Â mutex_unlock(&adev->shadow_list_lock); >>>>>> + >>>>>> + >>>>>> Â Â Â Â Â Â Â mutex_lock(&adev->shadow_list_lock); >>>>> local_shadow_list is a local variable, I think it doesn't need lock >>>>> at all, no one change it. Otherwise looks good to me. >>>> The bo->shadow_list which now is in local_shadow_list maybe destroy in >>>> case that it already in amdgpu_bo_destroy, then it will change >>> local_shadow_list, so need lock the shadow_list_lock. >>> Ah, sorry for noise, I forget you don't reference these BOs. >> Yes, I don't reference these Bos, as I found even reference these >> Bos, it still couldn't avoid the case that another process is already >> in amdgpu_bo_destroy. > ??? that shouldn't happen, the reference is belonged to list. But back > to here, we don't need reference them. > And since no shadow BO is added to local after splice, we'd better to > use list_next_entry to iterate the local shadow list instead of > list_for_each_entry_safe. > > Thanks, > David Zhou >>> Thanks, >>> David Zhou >>>> Best wishes >>>> Emily Deng >>>>> Thanks, >>>>> David Zhou >>>>>> -Â Â Â list_for_each_entry_safe(bo, tmp, &adev->shadow_list, >>>>>> shadow_list) { >>>>>> +Â Â Â list_for_each_entry_safe(bo, tmp, &local_shadow_list, >>>>>> shadow_list) { because shadow list doesn't take bo reference, we should give a amdgpu_bo_ref(bo) with attached patch before unlock. You can have a try. Thanks, David Zhou >>>>>> + mutex_unlock(&adev->shadow_list_lock); >>>>>> + >>>>>> +Â Â Â Â Â Â Â if (!bo) >>>>>> +Â Â Â Â Â Â Â Â Â Â Â continue; >>>>>> + >>>>>> Â Â Â Â Â Â Â Â Â Â Â next = NULL; >>>>>> Â Â Â Â Â Â Â Â Â Â Â amdgpu_device_recover_vram_from_shadow(adev, ring, bo, >>>>> &next); >>>>>> Â Â Â Â Â Â Â Â Â Â Â if (fence) { >>>>>> @@ -3132,9 +3146,14 @@ static int >>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) >>>>>> >>>>>> Â Â Â Â Â Â Â Â Â Â Â dma_fence_put(fence); >>>>>> Â Â Â Â Â Â Â Â Â Â Â fence = next; >>>>>> +Â Â Â Â Â Â Â mutex_lock(&adev->shadow_list_lock); >>>>>> Â Â Â Â Â Â Â } >>>>>> Â Â Â Â Â Â Â mutex_unlock(&adev->shadow_list_lock); >>>>>> >>>>>> +Â Â Â mutex_lock(&adev->shadow_list_lock); >>>>>> +Â Â Â list_splice_init(&local_shadow_list, &adev->shadow_list); >>>>>> +Â Â Â mutex_unlock(&adev->shadow_list_lock); >>>>>> + >>>>>> Â Â Â Â Â Â Â if (fence) { >>>>>> Â Â Â Â Â Â Â Â Â Â Â r = dma_fence_wait_timeout(fence, false, tmo); >>>>>> Â Â Â Â Â Â Â Â Â Â Â if (r == 0) >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-drm-amdgpu-changing-of-shadow-bo-reference-should-ta.patch Type: text/x-patch Size: 1652 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180911/5ab9b3e5/attachment.bin>