Re: Plumbing explicit synchronization through the Linux ecosystem

Daniel Vetter <daniel@xxxxxxxx> · Thu, 19 Mar 2020 11:51:29 +0100

On Tue, Mar 17, 2020 at 11:01:57AM +0100, Michel Dänzer wrote:
> On 2020-03-16 7:33 p.m., Marek Olšák wrote:
> > On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer <michel@xxxxxxxxxxx> wrote:
> >> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> >>> The synchronization works because the Mesa driver waits for idle (drains
> >>> the GFX pipeline) at the end of command buffers and there is only 1
> >>> graphics queue, so everything is ordered.
> >>>
> >>> The GFX pipeline runs asynchronously to the command buffer, meaning the
> >>> command buffer only starts draws and doesn't wait for completion. If the
> >>> Mesa driver didn't wait at the end of the command buffer, the command
> >>> buffer would finish and a different process could start execution of its
> >>> own command buffer while shaders of the previous process are still
> >> running.
> >>>
> >>> If the Mesa driver submits a command buffer internally (because it's
> >> full),
> >>> it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> >>> ended and a new one started.
> >>>
> >>> The waiting at the end of command buffers happens only when the flush is
> >>> external (Swap buffers, glFlush).
> >>>
> >>> It's a performance problem, because the GFX queue is blocked until the
> >> GFX
> >>> pipeline is drained at the end of every frame at least.
> >>>
> >>> So explicit fences for SwapBuffers would help.
> >>
> >> Not sure what difference it would make, since the same thing needs to be
> >> done for explicit fences as well, doesn't it?
> > 
> > No. Explicit fences don't require userspace to wait for idle in the command
> > buffer. Fences are signalled when the last draw is complete and caches are
> > flushed. Before that happens, any command buffer that is not dependent on
> > the fence can start execution. There is never a need for the GPU to be idle
> > if there is enough independent work to do.
> 
> I don't think explicit fences in the context of this discussion imply
> using that different fence signalling mechanism though. My understanding
> is that the API proposed by Jason allows implicit fences to be used as
> explicit ones and vice versa, so presumably they have to use the same
> signalling mechanism.
> 
> 
> Anyway, maybe the different fence signalling mechanism you describe
> could be used by the amdgpu kernel driver in general, then Mesa could
> drop the waits for idle and get the benefits with implicit sync as well?

Yeah, this is entirely about the programming model visible to userspace.
There shouldn't be any impact on the driver's choice of a top vs. bottom
of the gpu pipeline used for synchronization, that's entirely up to what
you're hw/driver/scheduler can pull off.

Doing a full gfx pipeline flush for shared buffers, when your hw can do
be, sounds like an issue to me that's not related to this here at all. It
might be intertwined with amdgpu's special interpretation of dma_resv
fences though, no idea. We might need to revamp all that. But for a
userspace client that does nothing fancy (no multiple render buffer
targets in one bo, or vk style "I write to everything all the time,
perhaps" stuff) there should be 0 perf difference between implicit sync
through dma_resv and explicit sync through sync_file/syncobj/dma_fence
directly.

If there is I'd consider that a bit a driver bug.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch