On Wed, 6 Nov 2024 13:44:20 +0100 Christian König <christian.koenig@xxxxxxx> wrote: > Am 06.11.24 um 09:14 schrieb Boris Brezillon: > > On Tue, 5 Nov 2024 09:56:22 -0800 > > Chia-I Wu<olvaffe@xxxxxxxxx> wrote: > > > >> On Mon, Nov 4, 2024 at 11:32 PM Christian König > >> <christian.koenig@xxxxxxx> wrote: > >>> Am 04.11.24 um 22:32 schrieb Chia-I Wu: > >>> > >>> On Tue, Oct 22, 2024 at 10:24 AM Chia-I Wu<olvaffe@xxxxxxxxx> wrote: > >>> > >>> On Tue, Oct 22, 2024 at 9:53 AM Christian König > >>> <christian.koenig@xxxxxxx> wrote: > >>> > >>> Am 22.10.24 um 18:18 schrieb Chia-I Wu: > >>> > >>> Userspace might poll a syncobj with the query ioctl. Call > >>> dma_fence_enable_sw_signaling to ensure dma_fence_is_signaled returns > >>> true in finite time. > >>> > >>> Wait a second, just querying the fence status is absolutely not > >>> guaranteed to return true in finite time. That is well documented on the > >>> dma_fence() object. > >>> > >>> When you want to poll on signaling from userspace you really need to > >>> call poll or the wait IOCTL with a zero timeout. That will also return > >>> immediately but should enable signaling while doing that. > >>> > >>> So just querying the status should absolutely *not* enable signaling. > >>> That's an intentional separation. > >>> > >>> I think it depends on what semantics DRM_IOCTL_SYNCOBJ_QUERY should have. > >>> > >>> > >>> Well that's what I pointed out. The behavior of the QUERY IOCTL is based on the behavior of the dma_fence and the later is documented to do exactly what it currently does. > >>> > >>> If DRM_IOCTL_SYNCOBJ_QUERY is mainly for vulkan timeline semaphores, > >>> it is a bit heavy if userspace has to do a > >>> DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT before a query. > >>> > >>> > >>> Maybe you misunderstood me, you *only* have to call DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT and *not* _QUERY. > >>> > >>> The underlying dma_fence_wait_timeout() function is extra optimized so that zero timeout has only minimal overhead. > >>> > >>> This overhead is actually lower than _QUERY because that one actually queries the driver for the current status while _WAIT just assumes that the driver will signal the fence when ready from an interrupt. > >> The context here is that vkGetSemaphoreCounterValue calls QUERY to get > >> the timeline value. WAIT does not replace QUERY. > > Oh, that is a really good argument. > > >> Taking a step back, in the binary (singled/unsignaled) case, a WAIT > >> with zero timeout can get the up-to-date status. But in the timeline > >> case, there is no direct way to get the up-to-date status if QUERY > >> must strictly be a wrapper for dma_fence_is_signaled. It comes back > >> to what was QUERY designed for and can we change it? > > Yeah that is a really good point. If _QUERY implements a different > functionality than _WAIT than the usual distinction between polling and > interrupt based approach can't be applied here. > > >>> I filed a Mesa issue, > >>> https://gitlab.freedesktop.org/mesa/mesa/-/issues/12094, and Faith > >>> suggested a kernel-side fix as well. Should we reconsider this? > >>> > >>> > >>> Wait a second, you might have an even bigger misconception here. The difference between waiting and querying is usually intentional! > >>> > >>> This is done so that for example on mobile devices you don't need to enable device interrupts, but rather query in defined intervals. > >>> > >>> This is a very common design pattern and while I don't know the wording of the Vulkan timeline extension it's quite likely that this is the intended use case. > >> Yeah, there are Vulkan CTS tests that query timeline semaphores > >> repeatedly for progress. Those tests can fail because mesa translates > >> the queries directly to the QUERY ioctl. > >> > >> As things are, enable_signaling is a requirement to query up-to-date > >> status no matter the syncobj is binary or a timeline. > > I kinda agree with Chia-I here. What's the point of querying a timeline > > syncobj if what we get in return is an outdated sync point? I get that > > the overhead of enabling signalling exists, but if we stand on this > > position, that means the QUERY ioctl is not suitable for > > vkGetSemaphoreCounterValue() unless we first add a > > WAIT(sync_point=0,timeout=0) to make sure signalling is enabled on all > > fences contained by the dma_fence_chain backing the timeline syncobj. > > And this has to be done for each vkGetSemaphoreCounterValue(), because > > new sync points don't have signalling enabled by default. > > The common driver design I know from other operating systems is that you > either poll without enabling interrupts or you enable interrupts and > wait for the async operation to complete. The problem is that we don't really poll if we call ::signaled() on a dma_fence_array. The test is based on the state the container was at creation time. The only way to poll a fence_array right now is to enable signalling. So maybe that's the problem we should solve here: make dma_fence_array_signaled() iterate over the fences instead of checking ::num_pending and returning false if it's not zero (see the diff at the end of this email). > > That distinction is really important for things like mobile devices > where interrupts are rather power costly. Sure, I get the importance of keeping interrupts disabled for power-savings. > > > At the very least, we should add a new DRM_SYNCOBJ_QUERY_FLAGS_ flag > > (DRM_SYNCOBJ_QUERY_FLAGS_REFRESH_STATE?) to combine the > > enable_signalling and query operations in a single ioctl. If we go > > for this approach, that means mesa has to support both cases, and pick > > the most optimal one if the kernel supports it. > > Another problem we have here is that dma_fence_enable_signaling() can > mean two different things, maybe the documentation is a bit confusing: > > 1) Enabling interrupts so that we don't need to call the > ops->is_signaled() driver callback. > > 2) Asking preemption fences to actually signal because memory management > wants to do something. Uh, I wasn't aware of (2). If it's document somewhere, I probably missed that part. > > So when this ops->is_signaled() callback is implemented it shouldn't be > necessary to enable signaling to query the state. Agree on that, see my comment about fence_array() not necessarily doing the right thing here. > > To summarize: When you call _QUERY you shouldn't get an outdated sync > point. When you do this then there is something wrong with the backend > driver. If by backend, you mean the dma_fence_array implementation, you're probably right. > > Regards, > Christian. --->8--- diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 8a08ffde31e7..c9222a065954 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -104,8 +104,22 @@ static bool dma_fence_array_signaled(struct dma_fence *fence) { struct dma_fence_array *array = to_dma_fence_array(fence); - if (atomic_read(&array->num_pending) > 0) - return false; + if (atomic_read(&array->num_pending) > 0) { + struct dma_fence *subfence; + unsigned i; + + /* + * If signaling is enabled, we don't need to iterate over + * fences to get the state, we can rely on num_pending > 0. + */ + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) + return false; + + dma_fence_array_for_each(subfence, i, fence) { + if (!dma_fence_is_signaled(subfence)) + return false; + } + } dma_fence_array_clear_pending_error(array); return true;