Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea

Daniel Vetter <daniel@xxxxxxxx> · Tue, 21 Jul 2020 11:24:20 +0200

On Tue, Jul 21, 2020 at 11:16 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Tue, Jul 21, 2020 at 10:55 AM Christian König
> <christian.koenig@xxxxxxx> wrote:
> >
> > Am 21.07.20 um 10:47 schrieb Thomas Hellström (Intel):
> > >
> > > On 7/21/20 9:45 AM, Christian König wrote:
> > >> Am 21.07.20 um 09:41 schrieb Daniel Vetter:
> > >>> On Mon, Jul 20, 2020 at 01:15:17PM +0200, Thomas Hellström (Intel)
> > >>> wrote:
> > >>>> Hi,
> > >>>>
> > >>>> On 7/9/20 2:33 PM, Daniel Vetter wrote:
> > >>>>> Comes up every few years, gets somewhat tedious to discuss, let's
> > >>>>> write this down once and for all.
> > >>>>>
> > >>>>> What I'm not sure about is whether the text should be more
> > >>>>> explicit in
> > >>>>> flat out mandating the amdkfd eviction fences for long running
> > >>>>> compute
> > >>>>> workloads or workloads where userspace fencing is allowed.
> > >>>> Although (in my humble opinion) it might be possible to completely
> > >>>> untangle
> > >>>> kernel-introduced fences for resource management and dma-fences
> > >>>> used for
> > >>>> completion- and dependency tracking and lift a lot of restrictions
> > >>>> for the
> > >>>> dma-fences, including prohibiting infinite ones, I think this makes
> > >>>> sense
> > >>>> describing the current state.
> > >>> Yeah I think a future patch needs to type up how we want to make that
> > >>> happen (for some cross driver consistency) and what needs to be
> > >>> considered. Some of the necessary parts are already there (with like
> > >>> the
> > >>> preemption fences amdkfd has as an example), but I think some clear
> > >>> docs
> > >>> on what's required from both hw, drivers and userspace would be really
> > >>> good.
> > >>
> > >> I'm currently writing that up, but probably still need a few days for
> > >> this.
> > >
> > > Great! I put down some (very) initial thoughts a couple of weeks ago
> > > building on eviction fences for various hardware complexity levels here:
> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fthomash%2Fdocs%2F-%2Fblob%2Fmaster%2FUntangling%2520dma-fence%2520and%2520memory%2520allocation.odt&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C8978bbd7823e4b41663708d82d52add3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637309180424312390&amp;sdata=tTxx2vfzfwLM1IBJSqqAZRw1604R%2F0bI3MwN1%2FBf2VQ%3D&amp;reserved=0
> > >
> >
> > I don't think that this will ever be possible.
> >
> > See that Daniel describes in his text is that indefinite fences are a
> > bad idea for memory management, and I think that this is a fixed fact.
> >
> > In other words the whole concept of submitting work to the kernel which
> > depends on some user space interaction doesn't work and never will.
> >
> > What can be done is that dma_fences work with hardware schedulers. E.g.
> > what the KFD tries to do with its preemption fences.
> >
> > But for this you need a better concept and description of what the
> > hardware scheduler is supposed to do and how that interacts with
> > dma_fence objects.
>
> Yeah I think trying to split dma_fence wont work, simply because of
> inertia. Creating an entirely new thing for augmented userspace
> controlled fencing, and then jotting down all the rules the
> kernel/hw/userspace need to obey to not break dma_fence is what I had
> in mind. And I guess that's also what Christian is working on. E.g.
> just going through all the cases of how much your hw can preempt or
> handle page faults on the gpu, and what that means in terms of
> dma_fence_begin/end_signalling and other constraints would be really
> good.

Or rephrased in terms of Thomas' doc: dma-fence will stay the memory
fence, and also the sync fence for current userspace and winsys.

Then we create a new thing and complete protocol and driver reving of
the entire world. The really hard part is that running old stuff on a
new stack is possible (we'd be totally screwed otherwise, since it
would become a system wide flag day). But running new stuff on an old
stack (even if it's just something in userspace like the compositor)
doesn't work, because then you tie the new synchronization fences back
into the dma-fence memory fences, and game over.

So yeah around 5 years or so for anything that wants to use a winsys,
or at least that's what it usually takes us to do something like this
:-/ Entirely stand-alone compute workloads (irrespective whether it's
cuda, cl, vk or whatever) doesn't have that problem ofc.
-Daniel

> -Daniel
>
> >
> > Christian.
> >
> > >
> > > /Thomas
> > >
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch