Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Wed, 28 Apr 2021 12:31:09 +0200

Am 28.04.21 um 12:05 schrieb Daniel Vetter:
On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
On Tue, Apr 27, 2021 at 1:35 PM Simon Ser <contact@xxxxxxxxxxx> wrote:
On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach <l.stach@xxxxxxxxxxxxxx> wrote:

Ok. So that would only make the following use cases broken for now:

- amd render -> external gpu
- amd video encode -> network device
FWIW, "only" breaking amd render -> external gpu will make us pretty
unhappy
I concur. I have quite a few users with a multi-GPU setup involving
AMD hardware.

Note, if this brokenness can't be avoided, I'd prefer a to get a clear
error, and not bad results on screen because nothing is synchronized
anymore.
It's an upcoming requirement for windows[1], so you are likely to
start seeing this across all GPU vendors that support windows.  I
think the timing depends on how quickly the legacy hardware support
sticks around for each vendor.
Yeah but hw scheduling doesn't mean the hw has to be constructed to not
support isolating the ringbuffer at all.

E.g. even if the hw loses the bit to put the ringbuffer outside of the
userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
pte flags. Otherwise the entire "share address space with cpu side,
seamlessly" thing is out of the window.

And with that r/o bit on the ringbuffer you can once more force submit
through kernel space, and all the legacy dma_fence based stuff keeps
working. And we don't have to invent some horrendous userspace fence based
implicit sync mechanism in the kernel, but can instead do this transition
properly with drm_syncobj timeline explicit sync and protocol reving.

At least I think you'd have to work extra hard to create a gpu which
cannot possibly be intercepted by the kernel, even when it's designed to
support userspace direct submit only.

Or are your hw engineers more creative here and we're screwed?

The upcomming hardware generation will have this hardware scheduler as a 
must have, but there are certain ways we can still stick to the old 
approach:

1. The new hardware scheduler currently still supports kernel queues 
which essentially is the same as the old hardware ring buffer.

2. Mapping the top level ring buffer into the VM at least partially 
solves the problem. This way you can't manipulate the ring buffer 
content, but the location for the fence must still be writeable.

For now and the next hardware we are save to support the old submission 
model, but the functionality of kernel queues will sooner or later go 
away if it is only for Linux.

So we need to work on something which works in the long term and get us 
away from this implicit sync.

Christian.

-Daniel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel