Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea

Thomas Hellström (Intel) <thomas_os@xxxxxxxxxxxx> · Wed, 22 Jul 2020 08:45:45 +0200

On 2020-07-22 00:45, Dave Airlie wrote:
On Tue, 21 Jul 2020 at 18:47, Thomas Hellström (Intel)
<thomas_os@xxxxxxxxxxxx> wrote:

On 7/21/20 9:45 AM, Christian König wrote:
Am 21.07.20 um 09:41 schrieb Daniel Vetter:
On Mon, Jul 20, 2020 at 01:15:17PM +0200, Thomas Hellström (Intel)
wrote:
Hi,

On 7/9/20 2:33 PM, Daniel Vetter wrote:
Comes up every few years, gets somewhat tedious to discuss, let's
write this down once and for all.

What I'm not sure about is whether the text should be more explicit in
flat out mandating the amdkfd eviction fences for long running compute
workloads or workloads where userspace fencing is allowed.
Although (in my humble opinion) it might be possible to completely
untangle
kernel-introduced fences for resource management and dma-fences used
for
completion- and dependency tracking and lift a lot of restrictions
for the
dma-fences, including prohibiting infinite ones, I think this makes
sense
describing the current state.
Yeah I think a future patch needs to type up how we want to make that
happen (for some cross driver consistency) and what needs to be
considered. Some of the necessary parts are already there (with like the
preemption fences amdkfd has as an example), but I think some clear docs
on what's required from both hw, drivers and userspace would be really
good.
I'm currently writing that up, but probably still need a few days for
this.
Great! I put down some (very) initial thoughts a couple of weeks ago
building on eviction fences for various hardware complexity levels here:

https://gitlab.freedesktop.org/thomash/docs/-/blob/master/Untangling%20dma-fence%20and%20memory%20allocation.odt
We are seeing HW that has recoverable GPU page faults but only for
compute tasks, and scheduler without semaphores hw for graphics.

So a single driver may have to expose both models to userspace and
also introduces the problem of how to interoperate between the two
models on one card.

Dave.

Hmm, yes to begin with it's important to note that this is not a 
replacement for new programming models or APIs, This is something that 
takes place internally in drivers to mitigate many of the restrictions 
that are currently imposed on dma-fence and documented in this and 
previous series. It's basically the driver-private narrow completions 
Jason suggested in the lockdep patches discussions implemented the same 
way as eviction-fences.

The memory fence API would be local to helpers and middle-layers like 
TTM, and the corresponding drivers.  The only cross-driver-like 
visibility would be that the dma-buf move_notify() callback would not be 
allowed to wait on dma-fences or something that depends on a dma-fence.

So with that in mind, I don't foresee engines with different 
capabilities on the same card being a problem.

/Thomas