[PATCH 1/4] drm/amdgpu: Optimization of critical section

axie@xxxxxxx (axie) · Tue, 13 Jun 2017 02:29:17 -0400

On 2017-06-12 07:00 PM, Christian KÃ¶nig wrote:
> Am 12.06.2017 um 22:31 schrieb Alex Xie:
>> This patch is to move a loop of unref BOs and
>> several memory free function calls out of
>> critical sections.
>>
>> Signed-off-by: Alex Xie <AlexBin.Xie at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> index a664987..02c138f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> @@ -75,9 +75,12 @@ static void amdgpu_bo_list_destroy(struct 
>> amdgpu_fpriv *fpriv, int id)
>>           /* Another user may have a reference to this list still */
>>           mutex_lock(&list->lock);
>>           mutex_unlock(&list->lock);
>> +        mutex_unlock(&fpriv->bo_list_lock);
>>           amdgpu_bo_list_free(list);
>>       }
>> -    mutex_unlock(&fpriv->bo_list_lock);
>> +    else {
>> +        mutex_unlock(&fpriv->bo_list_lock);
>> +    }
>
> You could move the unlock of bo_list_lock even before the if.
>
> But since you pointed it out there is quite a bug in this function:
>>                 /* Another user may have a reference to this list 
>> still */
>>                 mutex_lock(&list->lock);
>>                 mutex_unlock(&list->lock);
> Not sure if that is still up to date, but that use case used to be 
> illegal with mutexes.
As I understand this piece of code, these mutex_lock and mutex_unlock 
pair are used to make sure all other tasks have finished
access of the bo list. Another side of this story is in function 
amdgpu_bo_list_get. These two piece of codes together make sure
we can safely destroy bo list.

Otherwise we can easily simplify these lockings.
Let me give an example here.

In function amdgpu_bo_list_get, assuming we change the code like this:
...
     down_read(&fpriv->bo_list_lock);
     result = idr_find(&fpriv->bo_list_handles, id);
     up_read(&fpriv->bo_list_lock);
     /**Line 1. Task A was scheduled away from CPU**/
     if (result)
         mutex_lock(&result->lock);
...

In function amdgpu_bo_list_destroy, assuming we change the code like this:
...
     down_write(&fpriv->bo_list_lock);
     list = idr_remove(&fpriv->bo_list_handles, id);
     up_write(&fpriv->bo_list_lock);
     if (list) {
         /* Another user may have a reference to this list still */
         mutex_lock(&list->lock);
         mutex_unlock(&list->lock);
         amdgpu_bo_list_free(list);
     }
}

When task A is running in function amdgpu_bo_list_get in line 1, CPU 
scheduler takes CPU away from task A.
Then Task B run function amdgpu_bo_list_destroy. Task B can run all the 
way to destroy and free mutex.
Later Task A is back to run. The mutex result->lock has been destroyed 
by task B. Now task A try to lock a mutex
which has been destroyed and freed.

>
> Christian.
>
>>   }
>>     static int amdgpu_bo_list_set(struct amdgpu_device *adev,
>
>