lock/unlock mismatch in ttm_bo.c

christian.koenig@xxxxxxx (Christian König) · Wed, 24 Jan 2018 10:03:22 +0100

That patch won't work correctly like this.

When the lock is dropped it is possible that the BO is removed from the 
ddelete list and ttm_bo_cleanup_refs() starts to wait for the wrong 
reservation object.

I think we can remove the wait for bo->resv now and always wait for 
bo->ttm_resv, but I'm not 100% sure.

Need to double check the code as well,
Christian.

Am 23.01.2018 um 20:25 schrieb Tom St Denis:
> On 22/01/18 01:42 AM, Chunming Zhou wrote:
>>
>>
>> On 2018å¹´01æ??20æ?¥ 02:23, Tom St Denis wrote:
>>> On 19/01/18 01:14 PM, Tom St Denis wrote:
>>>> Hi all,
>>>>
>>>> In the function ttm_bo_cleanup_refs() it seems possible to get to 
>>>> line 551 without entering the block on 516 which means you'll be 
>>>> unlocking a mutex that wasn't locked.
>>>>
>>>> Now it might be that in the course of the API this pattern cannot 
>>>> be expressed but it's not clear from the function alone that that 
>>>> is the case.
>>>
>>>
>>> Looking further it seems the behaviour depends on locking in parent 
>>> callers.Â  That's kinda a no-no right?Â  Shouldn't the lock be 
>>> taken/released in the same function ideally?
>> Same feelings
>>
>> Regards,
>> David Zhou
>
> Attached is a patch that addresses this.
>
> I can't see any obvious race in functions that call 
> ttm_bo_cleanup_refs() between the time they let go of the lock and the 
> time it's taken again in the call.
>
> Running it on my system doesn't produce anything notable though the 
> KASAN with DRI_PRIME=1 issue is still there (this patch neither causes 
> that nor fixes it).
>
> Tom