On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote: > > All, > > Sorry for casting such a broad net with this one. I'm sure most people > who reply will get at least one mailing list rejection. However, this > is an issue that affects a LOT of components and that's why it's > thorny to begin with. Please pardon the length of this e-mail as > well; I promise there's a concrete point/proposal at the end. > > > Explicit synchronization is the future of graphics and media. At > least, that seems to be the consensus among all the graphics people > I've talked to. I had a chat with one of the lead Android graphics > engineers recently who told me that doing explicit sync from the start > was one of the best engineering decisions Android ever made. It's > also the direction being taken by more modern APIs such as Vulkan. > > > ## What are implicit and explicit synchronization? > > For those that aren't familiar with this space, GPUs, media encoders, > etc. are massively parallel and synchronization of some form is > required to ensure that everything happens in the right order and > avoid data races. Implicit synchronization is when bits of work (3D, > compute, video encode, etc.) are implicitly based on the absolute > CPU-time order in which API calls occur. Explicit synchronization is > when the client (whatever that means in any given context) provides > the dependency graph explicitly via some sort of synchronization > primitives. If you're still confused, consider the following > examples: > > With OpenGL and EGL, almost everything is implicit sync. Say you have > two OpenGL contexts sharing an image where one writes to it and the > other textures from it. The way the OpenGL spec works, the client has > to make the API calls to render to the image before (in CPU time) it > makes the API calls which texture from the image. As long as it does > this (and maybe inserts a glFlush?), the driver will ensure that the > rendering completes before the texturing happens and you get correct > contents. > > Implicit synchronization can also happen across processes. Wayland, > for instance, is currently built on implicit sync where the client > does their rendering and then does a hand-off (via wl_surface::commit) > to tell the compositor it's done at which point the compositor can now > texture from the surface. The hand-off ensures that the client's > OpenGL API calls happen before the server's OpenGL API calls. > > A good example of explicit synchronization is the Vulkan API. There, > a client (or multiple clients) can simultaneously build command > buffers in different threads where one of those command buffers > renders to an image and the other textures from it and then submit > both of them at the same time with instructions to the driver for > which order to execute them in. The execution order is described via > the VkSemaphore primitive. With the new VK_KHR_timeline_semaphore > extension, you can even submit the work which does the texturing > BEFORE the work which does the rendering and the driver will sort it > out. > > The #1 problem with implicit synchronization (which explicit solves) > is that it leads to a lot of over-synchronization both in client space > and in driver/device space. The client has to synchronize a lot more > because it has to ensure that the API calls happen in a particular > order. The driver/device have to synchronize a lot more because they > never know what is going to end up being a synchronization point as an > API call on another thread/process may occur at any time. As we move > to more and more multi-threaded programming this synchronization (on > the client-side especially) becomes more and more painful. > > > ## Current status in Linux > > Implicit synchronization in Linux works via a the kernel's internal > dma_buf and dma_fence data structures. A dma_fence is a tiny object > which represents the "done" status for some bit of work. Typically, > dma_fences are created as a by-product of someone submitting some bit > of work (say, 3D rendering) to the kernel. The dma_buf object has a > set of dma_fences on it representing shared (read) and exclusive > (write) access to the object. When work is submitted which, for > instance renders to the dma_buf, it's queued waiting on all the fences > on the dma_buf and and a dma_fence is created representing the end of > said rendering work and it's installed as the dma_buf's exclusive > fence. This way, the kernel can manage all its internal queues (3D > rendering, display, video encode, etc.) and know which things to > submit in what order. > > For the last few years, we've had sync_file in the kernel and it's > plumbed into some drivers. A sync_file is just a wrapper around a > single dma_fence. A sync_file is typically created as a by-product of > submitting work (3D, compute, etc.) to the kernel and is signaled when > that work completes. When a sync_file is created, it is guaranteed by > the kernel that it will become signaled in finite time and, once it's > signaled, it remains signaled for the rest of time. A sync_file is > represented in UAPIs as a file descriptor and can be used with normal > file APIs such as dup(). It can be passed into another UAPI which > does some bit of queue'd work and the submitted work will wait for the > sync_file to be triggered before executing. A sync_file also supports > poll() if you want to wait on it manually. > > Unfortunately, sync_file is not broadly used and not all kernel GPU > drivers support it. Here's a very quick overview of my understanding > of the status of various components (I don't know the status of > anything in the media world): > > - Vulkan: Explicit synchronization all the way but we have to go > implicit as soon as we interact with a window-system. Vulkan has APIs > to import/export sync_files to/from it's VkSemaphore and VkFence > synchronization primitives. > - OpenGL: Implicit all the way. There are some EGL extensions to > enable some forms of explicit sync via sync_file but OpenGL itself is > still implicit. > - Wayland: Currently depends on implicit sync in the kernel (accessed > via EGL/OpenGL). There is an unstable extension to allow passing > sync_files around but it's questionable how useful it is right now > (more on that later). > - X11: With present, it has these "explicit" fence objects but > they're always a shmfence which lets the X server and client do a > userspace CPU-side hand-off without going over the socket (and > round-tripping through the kernel). However, the only thing that > fence does is order the OpenGL API calls in the client and server and > the real synchronization is still implicit. > - linux/i915/gem: Fully supports using sync_file or syncobj for explicit sync. > - linux/amdgpu: Supports sync_file and syncobj but it still > implicitly syncs sometimes due to it's internal memory residency > handling which can lead to over-synchronization. > - KMS: Implicit sync all the way. There are no KMS APIs which take > explicit sync primitives. Correction: Apparently, I missed some things. If you use atomic, KMS does have explicit in- and out-fences. Non-atomic users (e.g. X11) are still in trouble but most Wayland compositors use atomic these days > - v4l: ??? > - gstreamer: ??? > - Media APIs such as vaapi etc.: ??? > > > ## Chicken and egg problems > > Ok, this is where it starts getting depressing. I made the claim > above that Wayland has an explicit synchronization protocol that's of > questionable usefulness. I would claim that basically any bit of > plumbing we do through window systems is currently of questionable > usefulness. Why? > > From my perspective, as a Vulkan driver developer, I have to deal with > the fact that Vulkan is an explicit sync API but Wayland and X11 > aren't. Unfortunately, the Wayland extension solves zero problems for > me because I can't really use it unless it's implemented in all of the > compositors. Until every Wayland compositor I care about my users > being able to use (which is basically all of them) supports the > extension, I have to continue carry around my pile of hacks to keep > implicit sync and Vulkan working nicely together. > > From the perspective of a Wayland compositor (I used to play in this > space), they'd love to implement the new explicit sync extension but > can't. Sure, they could wire up the extension, but the moment they go > to flip a client buffer to the screen directly, they discover that KMS > doesn't support any explicit sync APIs. As per the above correction, Wayland compositors aren't nearly as bad off as I initially thought. There may still be weird screen capture cases but the normal cases of compositing and displaying via KMS/atomic should be in reasonably good shape. > So, yes, they can technically > implement the extension assuming the EGL stack they're running on has > the sync_file extensions but any client buffers which come in using > the explicit sync Wayland extension have to be composited and can't be > scanned out directly. As a 3D driver developer, I absolutely don't > want compositors doing that because my users will complain about > performance issues due to the extra blit. > > Ok, so let's say we get KMS wired up with implicit sync. That solves > all our problems, right? It does, right up until someone decides that > they wan to screen capture their Wayland session via some hardware > media encoder that doesn't support explicit sync. Now we have to > plumb it all the way through the media stack, gstreamer, etc. Great, > so let's do that! Oh, but gstreamer won't want to plumb it through > until they're guaranteed that they can use explicit sync when > displaying on X11 or Wayland. Are you seeing the problem? > > To make matters worse, since most things are doing implicit > synchronization today, it's really easy to get your explicit > synchronization wrong and never notice. If you forget to pass a > sync_file into one place (say you never notice KMS doesn't support > them), it will probably work anyway thanks to all the implicit sync > that's going on elsewhere. > > So, clearly, we all need to go write piles of code that we can't > actually properly test until everyone else has written their piece and > then we use explicit sync if and only if all components support it. > Really? We're going to do multiple years of development and then just > hope it works when we finally flip the switch? That doesn't sound > like a good plan to me. > > > ## A proposal: Implicit and explicit sync together > > How to solve all these chicken-and-egg problems is something I've been > giving quite a bit of thought (and talking with many others about) in > the last couple of years. One motivation for this is that we have to > deal with a mismatch in Vulkan. Another motivation is that I'm > becoming increasingly unhappy with the way that synchronization, > memory residency, and command submission are inherently intertwined in > i915 and would like to break things apart. Towards that end, I have > an actual proposal. > > A couple weeks ago, I sent a series of patches to the dri-devel > mailing list which adds a pair of new ioctls to dma-buf which allow > userspace to manually import or export a sync_file from a dma-buf. > The idea is that something like a Wayland compositor can switch to > 100% explicit sync internally once the ioctl is available. If it gets > buffers in from a client that doesn't use the explicit sync extension, > it can pull a sync_file from the dma-buf and use that exactly as it > would a sync_file passed via the explicit sync extension. When it > goes to scan out a user buffer and discovers that KMS doesn't accept > sync_files (or if it tries to use that pesky media encoder no one has > converted), it can take it's sync_file for display and stuff it into > the dma-buf before handing it to KMS. > > Along with the kernel patches, I've also implemented support for this > in the Vulkan WSI code used by ANV and RADV. With those patches, the > only requirement on the Vulkan drivers is that you be able to export > any VkSemaphore as a sync_file and temporarily import a sync_file into > any VkFence or VkSemaphore. As long as that works, the core Vulkan > driver only ever sees explicit synchronization via sync_file. The WSI > code uses these new ioctls to translate the implicit sync of X11 and > Wayland to the explicit sync the Vulkan driver wants. > > I'm hoping (and here's where I want a sanity check) that a simple API > like this will allow us to finally start moving the Linux ecosystem > over to explicit synchronization one piece at a time in a way that's > actually correct. (No Wayland explicit sync with compositors hoping > KMS magically works even though it doesn't have a sync_file API.) > Once some pieces in the ecosystem start moving, there will be > motivation to start moving others and maybe we can actually build the > momentum to get most everything converted. > > For reference, you can find the kernel RFC patches and mesa MR here: > > https://lists.freedesktop.org/archives/dri-devel/2020-March/258833.html > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037 > > At this point, I welcome your thoughts, comments, objections, and > maybe even help/review. :-) > > --Jason Ekstrand