On Wed, 6 Nov 2024 13:44:20 +0100 Christian König <christian.koenig@xxxxxxx> wrote:Am 06.11.24 um 09:14 schrieb Boris Brezillon:[SNIP]I filed a Mesa issue, https://gitlab.freedesktop.org/mesa/mesa/-/issues/12094, and Faith suggested a kernel-side fix as well. Should we reconsider this? Wait a second, you might have an even bigger misconception here. The difference between waiting and querying is usually intentional! This is done so that for example on mobile devices you don't need to enable device interrupts, but rather query in defined intervals. This is a very common design pattern and while I don't know the wording of the Vulkan timeline extension it's quite likely that this is the intended use case.Yeah, there are Vulkan CTS tests that query timeline semaphores repeatedly for progress. Those tests can fail because mesa translates the queries directly to the QUERY ioctl. As things are, enable_signaling is a requirement to query up-to-date status no matter the syncobj is binary or a timeline.I kinda agree with Chia-I here. What's the point of querying a timeline syncobj if what we get in return is an outdated sync point? I get that the overhead of enabling signalling exists, but if we stand on this position, that means the QUERY ioctl is not suitable for vkGetSemaphoreCounterValue() unless we first add a WAIT(sync_point=0,timeout=0) to make sure signalling is enabled on all fences contained by the dma_fence_chain backing the timeline syncobj. And this has to be done for each vkGetSemaphoreCounterValue(), because new sync points don't have signalling enabled by default.The common driver design I know from other operating systems is that you either poll without enabling interrupts or you enable interrupts and wait for the async operation to complete.The problem is that we don't really poll if we call ::signaled() on a dma_fence_array. The test is based on the state the container was at creation time. The only way to poll a fence_array right now is to enable signalling. So maybe that's the problem we should solve here: make dma_fence_array_signaled() iterate over the fences instead of checking ::num_pending and returning false if it's not zero (see the diff at the end of this email).
Oh, really good point as well. I wasn't aware that we have this problem in dma-fence-array.
That distinction is really important for things like mobile devices where interrupts are rather power costly.Sure, I get the importance of keeping interrupts disabled for power-savings.At the very least, we should add a new DRM_SYNCOBJ_QUERY_FLAGS_ flag (DRM_SYNCOBJ_QUERY_FLAGS_REFRESH_STATE?) to combine the enable_signalling and query operations in a single ioctl. If we go for this approach, that means mesa has to support both cases, and pick the most optimal one if the kernel supports it.Another problem we have here is that dma_fence_enable_signaling() can mean two different things, maybe the documentation is a bit confusing: 1) Enabling interrupts so that we don't need to call the ops->is_signaled() driver callback. 2) Asking preemption fences to actually signal because memory management wants to do something.Uh, I wasn't aware of (2). If it's document somewhere, I probably missed that part.So when this ops->is_signaled() callback is implemented it shouldn't be necessary to enable signaling to query the state.Agree on that, see my comment about fence_array() not necessarily doing the right thing here.To summarize: When you call _QUERY you shouldn't get an outdated sync point. When you do this then there is something wrong with the backend driver.If by backend, you mean the dma_fence_array implementation, you're probably right.
I'm going to take a look at the dma_fence_array implementation and the documentation on dma_fence behavior we have.
Thanks a lot for pointing all that out,
Christian.
Regards, Christian.--->8--- diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 8a08ffde31e7..c9222a065954 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -104,8 +104,22 @@ static bool dma_fence_array_signaled(struct dma_fence *fence) { struct dma_fence_array *array = to_dma_fence_array(fence); - if (atomic_read(&array->num_pending) > 0) - return false; + if (atomic_read(&array->num_pending) > 0) { + struct dma_fence *subfence; + unsigned i; + + /* + * If signaling is enabled, we don't need to iterate over + * fences to get the state, we can rely on num_pending > 0. + */ + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) + return false; + + dma_fence_array_for_each(subfence, i, fence) { + if (!dma_fence_is_signaled(subfence)) + return false; + } + } dma_fence_array_clear_pending_error(array); return true;