On Tue, Apr 20, 2021 at 1:54 PM Daniel Vetter <daniel@xxxxxxxx> wrote: > > On Tue, Apr 20, 2021 at 7:45 PM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote: > > > And something more concrete: > > > > dma_fence. > > > > This already has all of the properties described above. Kernel-wise, it already devolves to CPU-side signaling when it crosses device boundaries. We need to support it roughly forever since it's been plumbed so far and so wide. Any primitive which is acceptable for winsys-like usage which crosses so many device/subsystem/process/security boundaries has to meet the same requirements. So why reinvent something which looks so similar, and has the same requirements of the kernel babysitting completion, providing little to no benefit for that difference? > > So I can mostly get behind this, except it's _not_ going to be > dma_fence. That thing has horrendous internal ordering constraints > within the kernel, and the one thing that doesn't allow you is to make > a dma_fence depend upon a userspace fence. Let me elaborate on this a bit. One of the problems I mentioned earlier is the conflation of fence types inside the kernel. dma_fence is used for solving two different semi-related but different problems: client command synchronization and memory residency synchronization. In the old implicit GL world, we conflated these two and thought we were providing ourselves a service. Not so much.... It's all well and good to say that we should turn the memory fence into a dma_fence and throw a timeout on it. However, these window-system sync primitives, as you said, have to be able to be shared across everything. In particular, we have to be able to share them with drivers that don't make a good separation between command and memory synchronization. Let's say we're rendering on ANV with memory fences and presenting on some USB display adapter whose kernel driver is a bit old-school. When we pass that fence to the other driver via a sync_file or similar, that driver may shove that dma_fence into the dma_resv on some buffer somewhere. Then our client, completely unaware of internal kernel dependencies, binds that buffer into its address space and kicks off another command buffer. So i915 throws in a dependency on that dma_resv which contains the previously created dma_fence and refuses to execute any more command buffers until it signals. Unfortunately, unbeknownst to i915, that command buffer which the client kicked off after doing that bind was required for signaling the memory fence on which our first dma_fence depends. Deadlock. Sure, we put a timeout on the dma_fence and it will eventually fire and unblock everything. However, there's one very important point that's easy to miss here: Neither i915 nor the client did anything wrong in the above scenario. The Vulkan footgun approach works because there are a set of rules and, if you follow those rules, you're guaranteed everything works. In the above scenario, however, the client followed all of the rules and got a deadlock anyway. We can't have that. > But what we can do is use the same currently existing container > objects like drm_syncobj or sync_file (timeline syncobj would fit best > tbh), and stuff a userspace fence behind it. The only trouble is that > currently timeline syncobj implement vulkan's spec, which means if you > build a wait-before-signal deadlock, you'll wait forever. Well until > the user ragequits and kills your process. Yeah, it may be that this approach can be made to work. Instead of reusing dma_fence, maybe we can reuse syncobj and have another form of syncobj which is a memory fence, a value to wait on, and a timeout. --Jason _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel