Re: [PATCH v2] drm/syncobj: ensure progress for syncobj queries

Christian König <christian.koenig@xxxxxxx> · Wed, 6 Nov 2024 13:44:20 +0100

    Am 06.11.24 um 09:14 schrieb Boris Brezillon:

      On Tue, 5 Nov 2024 09:56:22 -0800
Chia-I Wu <olvaffe@xxxxxxxxx> wrote:

        On Mon, Nov 4, 2024 at 11:32 PM Christian König
<christian.koenig@xxxxxxx> wrote:

          Am 04.11.24 um 22:32 schrieb Chia-I Wu:

On Tue, Oct 22, 2024 at 10:24 AM Chia-I Wu <olvaffe@xxxxxxxxx> wrote:

On Tue, Oct 22, 2024 at 9:53 AM Christian König
<christian.koenig@xxxxxxx> wrote:

Am 22.10.24 um 18:18 schrieb Chia-I Wu:

Userspace might poll a syncobj with the query ioctl.  Call
dma_fence_enable_sw_signaling to ensure dma_fence_is_signaled returns
true in finite time.

Wait a second, just querying the fence status is absolutely not
guaranteed to return true in finite time. That is well documented on the
dma_fence() object.

When you want to poll on signaling from userspace you really need to
call poll or the wait IOCTL with a zero timeout. That will also return
immediately but should enable signaling while doing that.

So just querying the status should absolutely *not* enable signaling.
That's an intentional separation.

I think it depends on what semantics DRM_IOCTL_SYNCOBJ_QUERY should have.

Well that's what I pointed out. The behavior of the QUERY IOCTL is based on the behavior of the dma_fence and the later is documented to do exactly what it currently does.

If DRM_IOCTL_SYNCOBJ_QUERY is mainly for vulkan timeline semaphores,
it is a bit heavy if userspace has to do a
DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT before a query.

Maybe you misunderstood me, you *only* have to call DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT and *not* _QUERY.

The underlying dma_fence_wait_timeout() function is extra optimized so that zero timeout has only minimal overhead.

This overhead is actually lower than _QUERY because that one actually queries the driver for the current status while _WAIT just assumes that the driver will signal the fence when ready from an interrupt.  

        The context here is that vkGetSemaphoreCounterValue calls QUERY to get
the timeline value.  WAIT does not replace QUERY.

    Oh, that is a really good argument.

        Taking a step back, in the binary (singled/unsignaled) case, a WAIT
with zero timeout can get the up-to-date status.  But in the timeline
case, there is no direct way to get the up-to-date status if QUERY
must strictly be a wrapper for dma_fence_is_signaled.  It comes back
to what was QUERY designed for and can we change it?

    Yeah that is a really good point. If _QUERY implements a different
    functionality than _WAIT than the usual distinction between polling
    and interrupt based approach can't be applied here.

          I filed a Mesa issue,
https://gitlab.freedesktop.org/mesa/mesa/-/issues/12094, and Faith
suggested a kernel-side fix as well.  Should we reconsider this?

Wait a second, you might have an even bigger misconception here. The difference between waiting and querying is usually intentional!

This is done so that for example on mobile devices you don't need to enable device interrupts, but rather query in defined intervals.

This is a very common design pattern and while I don't know the wording of the Vulkan timeline extension it's quite likely that this is the intended use case.  

        Yeah, there are Vulkan CTS tests that query timeline semaphores
repeatedly for progress.  Those tests can fail because mesa translates
the queries directly to the QUERY ioctl.

As things are, enable_signaling is a requirement to query up-to-date
status no matter the syncobj is binary or a timeline.

      I kinda agree with Chia-I here. What's the point of querying a timeline
syncobj if what we get in return is an outdated sync point? I get that
the overhead of enabling signalling exists, but if we stand on this
position, that means the QUERY ioctl is not suitable for
vkGetSemaphoreCounterValue() unless we first add a
WAIT(sync_point=0,timeout=0) to make sure signalling is enabled on all
fences contained by the dma_fence_chain backing the timeline syncobj.
And this has to be done for each vkGetSemaphoreCounterValue(), because
new sync points don't have signalling enabled by default.

    The common driver design I know from other operating systems is that
    you either poll without enabling interrupts or you enable interrupts
    and wait for the async operation to complete.

    That distinction is really important for things like mobile devices
    where interrupts are rather power costly.

      At the very least, we should add a new DRM_SYNCOBJ_QUERY_FLAGS_ flag
(DRM_SYNCOBJ_QUERY_FLAGS_REFRESH_STATE?) to combine the
enable_signalling and query operations in a single ioctl. If we go
for this approach, that means mesa has to support both cases, and pick
the most optimal one if the kernel supports it.

    Another problem we have here is that dma_fence_enable_signaling()
    can mean two different things, maybe the documentation is a bit
    confusing:

    1) Enabling interrupts so that we don't need to call the
    ops->is_signaled() driver callback.

    2) Asking preemption fences to actually signal because memory
    management wants to do something.

    So when this ops->is_signaled() callback is implemented it
    shouldn't be necessary to enable signaling to query the state.

    To summarize: When you call _QUERY you shouldn't get an outdated
    sync point. When you do this then there is something wrong with the
    backend driver.

    Regards,

    Christian.