Am 12.05.21 um 10:13 schrieb Daniel Vetter:
On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:
Am 11.05.21 um 18:48 schrieb Daniel Vetter:
[SNIP]
Why?
If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence
Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.
Ok, wait a second. I really don't understand what's going on here.
The out fence is just to let the userspace know when the frame is displayed.
Or rather when the old frame is no longer displayed so that it can be
reused, right?
Then why does that need to be a dma_fence? We don't use that for memory
management anywhere, don't we?
It can be a sync_file, so you can queue up the next rendering targeting
the old backbuffer right away. On memory constraint devices where
triple-buffering is prohibitive this is apparently a pretty cool trick or
something like that.
Yeah, I'm aware of that.
But we don't really need that for device we want to interop with
userspace queues, don't we?
So you have to do an uapi change here. At that point we might as well do
it right.
I mean in the worst case we might need to allow user fences with sync_files
as well when that is really used outside of Android.
But I still don't see the fundamental problem here.
Making sync_file use implicit is more pain, it becomes sprawling pretty
quickly.
Agreed, but I don't see supporting sync_file and out fences as something
necessarily.
As far as I can see we don't need to support that burden for the use
cases we have for implicit sync.
And even if we have to it is possible to implement.
Anyway I think we're just turning in circles. My take is that your patch
series here demonstrates nothing beyond "adding function calls that do
nothing is easy", the real deal is in making it work. And I think it's
easier to make this all work with explicit userspace fencing first and
then worry about how we maybe make this work implicitly. Instead of what
you propose, which is rig up a lot of scaffolding without having any idea
whether it's even in the right place. That seems very backwards to me.
And that's what I disagree on.
Supporting implicit sync in the kernel for the functionality we need to
amdgpu is relatively easily.
Change all of userspace to not rely on implicit sync any more is really
hard compared to that.
Dropping implicit sync in userspace has a lot of advantage and should be
pushed for, but not because it is a prerequisite of user fences.
Regards,
Christian.
Plus I really don't like retro-fitting userspace fences into implicit sync
and sync_file and everything. But that's not the main issue I have here.
-Daniel
Regards,
Christian.
Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.
So I have a plan here, what was yours?
As far as I see that should still work perfectly fine and I have the strong
feeling I'm missing something here.
Transporting fences between processes is not the fundamental problem here,
but rather the question how we represent all this in the kernel?
In other words I think what you outlined above is just approaching it from
the wrong side again. Instead of looking what the kernel needs to support
this you take a look at userspace and the requirements there.
Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.
I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).
So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.
Well I'm trying to figure out why you see it as such a problem to keep
implicit sync around.
As far as I can tell it is completely octagonal if we use implicit/explicit
and dma_fence/user_fence.
It's just a different implementation inside the kernel.
See above. It falls apart with the atomic ioctl.
-Daniel