Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea

Thomas Hellström (Intel) <thomas_os@xxxxxxxxxxxx> · Tue, 21 Jul 2020 19:46:25 +0200

On 2020-07-21 15:59, Christian König wrote:
Am 21.07.20 um 12:47 schrieb Thomas Hellström (Intel):
...
Yes, we can't do magic. As soon as an indefinite batch makes it to 
such hardware we've lost. But since we can break out while the batch 
is stuck in the scheduler waiting, what I believe we *can* do with 
this approach is to avoid deadlocks due to locally unknown 
dependencies, which has some bearing on this documentation patch, and 
also to allow memory allocation in dma-fence (not memory-fence) 
critical sections, like gpu fault- and error handlers without 
resorting to using memory pools.

Avoiding deadlocks is only the tip of the iceberg here.

When you allow the kernel to depend on user space to proceed with some 
operation there are a lot more things which need consideration.

E.g. what happens when an userspace process which has submitted stuff 
to the kernel is killed? Are the prepared commands send to the 
hardware or aborted as well? What do we do with other processes 
waiting for that stuff?

How to we do resource accounting? When processes need to block when 
submitting to the hardware stuff which is not ready we have a process 
we can punish for blocking resources. But how is kernel memory used 
for a submission accounted? How do we avoid deny of service attacks 
here were somebody eats up all memory by doing submissions which can't 
finish?

Hmm. Are these problems really unique to user-space controlled 
dependencies? Couldn't you hit the same or similar problems with 
mis-behaving shaders blocking timeline progress?

/Thomas