[PATCH] drm/amdgpu: Fix the dead lock issue.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2018å¹´09æ??11æ?¥ 11:37, zhoucm1 wrote:
>
>
> On 2018å¹´09æ??11æ?¥ 11:32, Deng, Emily wrote:
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>> zhoucm1
>>> Sent: Tuesday, September 11, 2018 11:28 AM
>>> To: Deng, Emily <Emily.Deng at amd.com>; Zhou, David(ChunMing)
>>> <David1.Zhou at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>>>
>>>
>>>
>>> On 2018å¹´09æ??11æ?¥ 11:23, Deng, Emily wrote:
>>>>> -----Original Message-----
>>>>> From: Zhou, David(ChunMing)
>>>>> Sent: Tuesday, September 11, 2018 11:03 AM
>>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>>>>>
>>>>>
>>>>>
>>>>> On 2018å¹´09æ??11æ?¥ 10:51, Emily Deng wrote:
>>>>>> It will ramdomly have the dead lock issue when test TDR:
>>>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
>>>>>> amdgpu_bo_create locked the bo's resv lock 3.
>>>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4.
>>>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
>>>>>> lock.
>>>>>>
>>>>>> v2:
>>>>>>       Make a local copy of the list
>>>>>>
>>>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
>>>>> ++++++++++++++++++++-
>>>>>>     1 file changed, 20 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index 2a21267..8c81404 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -3105,6 +3105,9 @@ static int
>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
>>>>>>         long r = 1;
>>>>>>         int i = 0;
>>>>>>         long tmo;
>>>>>> +    struct list_head local_shadow_list;
>>>>>> +
>>>>>> +    INIT_LIST_HEAD(&local_shadow_list);
>>>>>>
>>>>>>         if (amdgpu_sriov_runtime(adev))
>>>>>>             tmo = msecs_to_jiffies(8000);
>>>>>> @@ -3112,8 +3115,19 @@ static int
>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
>>>>>>             tmo = msecs_to_jiffies(100);
>>>>>>
>>>>>>         DRM_INFO("recover vram bo from shadow start\n");
>>>>>> +
>>>>>> +    mutex_lock(&adev->shadow_list_lock);
>>>>>> +    list_splice_init(&adev->shadow_list, &local_shadow_list);
>>>>>> +    mutex_unlock(&adev->shadow_list_lock);
>>>>>> +
>>>>>> +
>>>>>>         mutex_lock(&adev->shadow_list_lock);
>>>>> local_shadow_list is a local variable, I think it doesn't need lock
>>>>> at all, no one change it. Otherwise looks good to me.
>>>> The bo->shadow_list which now is in local_shadow_list maybe destroy in
>>>> case that it already in amdgpu_bo_destroy, then it will change
>>> local_shadow_list, so need lock the shadow_list_lock.
>>> Ah, sorry for noise, I forget you don't reference these BOs.
>> Yes, I don't reference these Bos, as I found even reference these 
>> Bos, it still couldn't avoid the case that another process is already
>> in amdgpu_bo_destroy.
> ??? that shouldn't happen, the reference is belonged to list. But back 
> to here, we don't need reference them.
> And since no shadow BO is added to local after splice, we'd better to 
> use list_next_entry to iterate the local shadow list instead of 
> list_for_each_entry_safe.
>
> Thanks,
> David Zhou
>>> Thanks,
>>> David Zhou
>>>> Best wishes
>>>> Emily Deng
>>>>> Thanks,
>>>>> David Zhou
>>>>>> -    list_for_each_entry_safe(bo, tmp, &adev->shadow_list, 
>>>>>> shadow_list) {
>>>>>> +    list_for_each_entry_safe(bo, tmp, &local_shadow_list, 
>>>>>> shadow_list) {
because shadow list doesn't take bo reference, we should give a 
amdgpu_bo_ref(bo) with attached patch before unlock.
You can have a try.

Thanks,
David Zhou
>>>>>> + mutex_unlock(&adev->shadow_list_lock);
>>>>>> +
>>>>>> +        if (!bo)
>>>>>> +            continue;
>>>>>> +
>>>>>>             next = NULL;
>>>>>>             amdgpu_device_recover_vram_from_shadow(adev, ring, bo,
>>>>> &next);
>>>>>>             if (fence) {
>>>>>> @@ -3132,9 +3146,14 @@ static int
>>>>>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
>>>>>>
>>>>>>             dma_fence_put(fence);
>>>>>>             fence = next;
>>>>>> +        mutex_lock(&adev->shadow_list_lock);
>>>>>>         }
>>>>>>         mutex_unlock(&adev->shadow_list_lock);
>>>>>>
>>>>>> +    mutex_lock(&adev->shadow_list_lock);
>>>>>> +    list_splice_init(&local_shadow_list, &adev->shadow_list);
>>>>>> +    mutex_unlock(&adev->shadow_list_lock);
>>>>>> +
>>>>>>         if (fence) {
>>>>>>             r = dma_fence_wait_timeout(fence, false, tmo);
>>>>>>             if (r == 0)
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-amdgpu-changing-of-shadow-bo-reference-should-ta.patch
Type: text/x-patch
Size: 1652 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180911/5ab9b3e5/attachment.bin>


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux