On Wed, May 26, 2021 at 6:09 AM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote: > On Mon, 24 May 2021 at 18:11, Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote: > > 3. Userspace memory fences. > > > > Note that timeline syncobj is NOT in that list. IMO, all the "wait > > for submit" stuff is an implementation detail we needed in order to > > get the timeline semantics on top of immutable SW fences. Under the > > hood it's all dma_fence; this just gives us a shareable container so > > we can implement VK_KHR_timeline_semaphore with sharing. I really > > don't want to make Wayland protocol around it if memory fences are the > > final solution. > > Typing out the Wayland protocol isn't the hard bit. If we just need to > copy and sed syncobj to weirdsyncobj, no problem really, and it gives > us a six-month head start on painful compositor-internal surgery > whilst we work on common infrastructure to ship userspace fences > around (mappable dmabuf with the sync bracketing? FD where every > read() gives you the current value? memfd? other?). I feel like I should elaborate more about timelines. In my earlier reply, my commentary about timeline syncobj was mostly focused around helping people avoid typing. That's not really the full story, though, and I hope more context will help. First, let me say that timeline syncobj was designed as a mechanism to implement VK_KHR_timeline_semaphore without inserting future fences into the kernel. It's entirely designed around the needs of Vulkan drivers, not really as a window-system primitive. The semantics are designed around one driver communicating to another that new fences have been added and it's safe to kick off more rendering. I'm not convinced that it's the right object for window-systems and I'm also not convinced that it's a good idea to try and make a version of it that's a wrapper around a userspace memory fence. (I'm going to start typing UMF for userspace memory fence because it's long to type out.) Why? Well, the fundamental problem with timelines in general is trying to figure out when it's about to be done. But timeline syncobj solves this for us! It gives us this fancy super-useful ioctl! Right? Uh.... not as well as I'd like. Let's say we make a timeline syncobj that's a wrapper around a userspace memory fence. What do we do with that ioctl? As I mentioned above, the kernel doesn't have any clue when it will be triggered so that ioctl turns into an actual wait. That's no good because it creates unnecessary stalls. There's another potential solution here: Have each UMF be two timelines: submitted and completed. At the start of every batch that's supposed to trigger a UMF, we set the "submitted" side and then, when it completes, we set the "completed" side. Ok, great, now we can get at the "about to be done" with the submitted side, implement the ioctl, and we're all good, right? Sadly, no. There's no guarantee about how long a "batch" takes. So there's no universal timeout the kernel can apply. Also, if it does time out, the kernel doesn't know who to blame for the timeout and how to prevent itself from getting in trouble again. The compositor does so, in theory, given the right ioctls, it could detect the -ETIME and kill that client. Not a great solution. The best option I've been able to come up with for this is some sort of client-provided signal. Something where it says, as part of submit or somewhere else, "I promise I'll be done soon" where that promise comes with dire consequences if it's not. At that point, we can turn the UMF and a particular wait value into a one-shot fence like a dma_fence or sync_file, or signal a syncobj on it. If it ever times out, we kick their context. In Vulkan terminology, they get VK_ERROR_DEVICE_LOST. There are two important bits here: First, is that it's based on a client-provided thing. With a fully timeline model and wait-before-signal, we can't infer when something is about to be done. Only the client knows when it submitted its last node in the dependency graph and the whole mess is unblocked. Second, is that the dma_fence is created within the client's driver context. If it's created compositor-side, the kernel doesn't know who to blame if things go badly. If we create it in the client, it's pretty easy to make context death on -ETIME part of the contract. (Before danvet jumps in here and rants about how UMF -> dma_fence isn't possible, I haven't forgotten. I'm pretending, for now, that we've solved some of those problems.) Another option is to just stall on the UMF until it's done. Yeah, kind-of terrible and high-latency, but it always works and doesn't involve any complex logic to kill clients. If a client never gets around to signaling a fence, it just never repaints. The compositor keeps going like nothing's wrong. Maybe, if the client submits lots of frames without ever triggering, it'll hit some max queue depth somewhere and kill it but that's it. More likely, the client's vkAcquireNextImage will start timing out and it'll crash. I suspect where we might actually land is some combination of the two depending on client choice. If the client wants to be dumb, it gets the high-latency always-works path. If the client really wants lowest-latency VRR, it has to take the smarter path and risk VK_ERROR_DEVICE_LOST if it misses too far. But the point of all of this is, neither of the above two paths have anything to do with the compositor calling a "wait for submit" ioctl. Building a design around that and baking it into protocol is, IMO, a mistake. I don't see any valid way to handle this mess without "wait for sumbit" either not existing or existing only client-side for the purposes of WSI. --Jason