[PATCH 4/4] drm/amdgpu: Optimize mutex usage

axie@xxxxxxx (axie) · Tue, 13 Jun 2017 02:03:04 -0400

Hi Christian,

Regarding comment 1.

Please read the file idr.c line 24 to line 29.

Let me quote here. Isn't it surprise you? idr_alloc only need read lock.

...

  * Simultaneous modifications to the @idr are not allowed and should be
  * prevented by the user, usually with a lock.  idr_alloc() may be called
  * concurrently with read-only accesses to the @idr, such as idr_find() and
  * idr_for_each_entry().
  */
int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp)

...

Regarding comment 2.

I have checked relevant information of rw_semaphore. Read lock is as 
efficient as mutex.

Write lock is less efficient than mutex if "optimistic spinning on lock 
acquisition" is not configured.

But we only use write lock when destroy bo list. Most of time, as you 
said, contention does not exist.

So the write lock does not need to wait. So the less efficiency is not a 
factor compared with mutex.

To decide which lock is better, it is more important to choose correct 
lock mechanism than

the efficiency of the lock itself. For the current bo_list, the multiple 
readers can block each other.

And what is even worse, the get function can wait for other thing when 
holding the mutex.

If you said that the user mode driver was single thread, we don't need 
any mutex here then...

-Alex Bin Xie

On 2017-06-12 06:54 PM, Christian KÃ¶nig wrote:
> Am 12.06.2017 um 22:31 schrieb Alex Xie:
>> Use rw_semaphore instead of mutex for bo_lists.
>>
>> In original function amdgpu_bo_list_get, the waiting
>> for result->lock can be quite long while mutex
>> bo_list_lock was holding. It can make other tasks
>> waiting for bo_list_lock for long period too.
>> Change bo_list_lock to rw_semaphore can avoid most of
>> such long waiting.
>>
>> Secondly, this patch allows several tasks(readers of idr)
>> to proceed at the same time.
>>
>> Signed-off-by: Alex Xie <AlexBin.Xie at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  3 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 14 +++++++-------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c     |  3 +--
>>   3 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 063fc73..bfc40d7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -35,6 +35,7 @@
>>   #include <linux/rbtree.h>
>>   #include <linux/hashtable.h>
>>   #include <linux/dma-fence.h>
>> +#include <linux/rwsem.h>
>>     #include <ttm/ttm_bo_api.h>
>>   #include <ttm/ttm_bo_driver.h>
>> @@ -859,7 +860,7 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr 
>> *mgr);
>>   struct amdgpu_fpriv {
>>       struct amdgpu_vm    vm;
>>       struct amdgpu_bo_va    *prt_va;
>> -    struct mutex        bo_list_lock;
>> +    struct rw_semaphore    bo_list_lock;
>>       struct idr        bo_list_handles;
>>       struct amdgpu_ctx_mgr    ctx_mgr;
>>       u32            vram_lost_counter;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> index 4363f28..bfe736e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> @@ -45,10 +45,10 @@ static int amdgpu_bo_list_create(struct 
>> amdgpu_fpriv *fpriv,
>>       if (!*result)
>>           return -ENOMEM;
>>   -    mutex_lock(&fpriv->bo_list_lock);
>> +    down_read(&fpriv->bo_list_lock);
>>       r = idr_alloc(&fpriv->bo_list_handles, *result,
>>                 1, 0, GFP_KERNEL);
>> -    mutex_unlock(&fpriv->bo_list_lock);
>> +    up_read(&fpriv->bo_list_lock);
>
> The protection here is incorrect, you need to take the write side here 
> as well.
>
>>         if (r < 0) {
>>           kfree(*result);
>> @@ -67,17 +67,17 @@ static void amdgpu_bo_list_destroy(struct 
>> amdgpu_fpriv *fpriv, int id)
>>   {
>>       struct amdgpu_bo_list *list;
>>   -    mutex_lock(&fpriv->bo_list_lock);
>> +    down_write(&fpriv->bo_list_lock);
>>       list = idr_remove(&fpriv->bo_list_handles, id);
>>       if (list) {
>>           /* Another user may have a reference to this list still */
>>           mutex_lock(&list->lock);
>>           mutex_unlock(&list->lock);
>> -        mutex_unlock(&fpriv->bo_list_lock);
>> +        up_write(&fpriv->bo_list_lock);
>>           amdgpu_bo_list_free(list);
>>       }
>>       else {
>> -        mutex_unlock(&fpriv->bo_list_lock);
>> +        up_write(&fpriv->bo_list_lock);
>>       }
>>   }
>>   @@ -173,11 +173,11 @@ amdgpu_bo_list_get(struct amdgpu_fpriv 
>> *fpriv, int id)
>>   {
>>       struct amdgpu_bo_list *result;
>>   -    mutex_lock(&fpriv->bo_list_lock);
>> +    down_read(&fpriv->bo_list_lock);
>>       result = idr_find(&fpriv->bo_list_handles, id);
>>       if (result)
>>           mutex_lock(&result->lock);
>> -    mutex_unlock(&fpriv->bo_list_lock);
>> +    up_read(&fpriv->bo_list_lock);
>
> Actually rw_semaphores are a bit less efficient than mutexes and the 
> contention on this is practically not existent since at least the open 
> source stack makes all BO list operations from a single thread.
>
> So I'm not sure if that is actually more efficient.
>
> Christian.
>
>>       return result;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index f68ced6..c4cba81 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -841,7 +841,7 @@ int amdgpu_driver_open_kms(struct drm_device 
>> *dev, struct drm_file *file_priv)
>>               goto out_suspend;
>>       }
>>   -    mutex_init(&fpriv->bo_list_lock);
>> +    init_rwsem(&fpriv->bo_list_lock);
>>       idr_init(&fpriv->bo_list_handles);
>>         amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
>> @@ -900,7 +900,6 @@ void amdgpu_driver_postclose_kms(struct 
>> drm_device *dev,
>>           amdgpu_bo_list_free(list);
>>         idr_destroy(&fpriv->bo_list_handles);
>> -    mutex_destroy(&fpriv->bo_list_lock);
>>         kfree(fpriv);
>>       file_priv->driver_priv = NULL;
>
>