[PATCH] [RFC]drm/ttm: fix scheduling balance

thomas@xxxxxxxxxxxx (Thomas Hellstrom) · Thu, 25 Jan 2018 17:47:41 +0100

On 01/25/2018 03:57 PM, Thomas Hellstrom wrote:
> On 01/25/2018 10:59 AM, Chunming Zhou wrote:
>> there is a scheduling balance issue about get node like:
>> a. process A allocates full memory and use it for submission.
>> b. process B tries to allocates memory, will wait for process A BO 
>> idle in eviction.
>> c. process A completes the job, process B eviction will put process A 
>> BO node,
>> but in the meantime, process C is comming to allocate BO, whill 
>> directly get node successfully, and do submission,
>> process B will again wait for process C BO idle.
>> d. repeat the above setps, process B could be delayed much more.
>>
>> add a mutex to gerantee the allocation sequence for same domain. But 
>> there is a possibility that
>> visible vram could be evicted to invisilbe, the tricky is they are 
>> same domain manager, so which needs a special handling.
>>
>> Change-Id: I260e8eb704f7b4788b071d3f641f21b242912256
>> Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
>
> I think this is a good approach, however there are two things that IMO 
> needs fixing. [...]

Thinking a bit more about this, the end result would be that typical "C" 
processes would get an unfair amount of GPU scheduling.
Isn't it actually a scheduler's task outside of TTM to mitigate this?

Further, TTM has had a design principle of avoiding locks held while 
waiting for GPU, with the exception of buffer object reservations,
I think this would be the first violator, but a fairly harmless one.

I can see the use for it though. It would also allow scanning the LRU 
lists for a suitable set of buffer objects to evict, rather than 
evicting in strict LRU order...

/Thomas