Hi Christian, Am Montag, dem 27.01.2025 um 17:14 +0100 schrieb Christian König: > Am 27.01.25 um 17:02 schrieb Lucas Stach: > > This effectively reverts 0fea2ed61e7f ("drm/amdgpu: Remove call to > > reservation_object_test_signaled_rcu before wait"), as the premise of > > that commit is wrong. dma_resv_wait_timeout() without a timeout will > > not turn into a wait-free dma_resv_test_signaled, but will wait a > > jiffy for unsignaled fences, which is not at all what userspace expects > > when it calls GEM_WAIT_IDLE with a timeout of 0. > > Marek pinged me about that strange behavior as well. That sounds like a > separate bug in dma_resv_wait_timeout() to me. > > I don't see why the function should be waiting with a timeout of 0 and > we have multiple cases where that is assumed. > > What should happen is that dma_resv_wait_timeout() should return 1 when > all fences are signaled even when the timeout is 0. > > My educated guess is that this behavior is somehow broken and instead we > wait for at least 1 jiffies. > > This here is probably just the tip of the iceberg. > dma_resv_wait_timeout() explicitly sets timeout to a single jiffy when it is entered with timeout == 0. This behavior was introduced with your commit 06a66b5c77ed ("reservation: revert "wait only with non-zero timeout specified (v3)" v2"), which seems to fix a real bug. Between the two behaviors I think it is wrong for callers of dma_resv_wait_timeout() to assume that the call is necessarily wait- free in case of timeout == 0. If you want wait-free but are able to deal with false positive unsignaled returns use dma_resv_test_signaled, otherwise use dma_resv_wait_timeout to get accurate answers with a possible wait. Regards, Lucas > Regards, > Christian. > > > > > Signed-off-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx> > > --- > > This is most likely the correct kernel-side solution for the unexpected > > slowdown worked around with in userspace with this Mesa series: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32877 > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > index 1a5df8b94661..75b5d5149911 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > @@ -567,8 +567,13 @@ int amdgpu_gem_wait_idle_ioctl(struct drm_device *dev, void *data, > > return -ENOENT; > > > > robj = gem_to_amdgpu_bo(gobj); > > - ret = dma_resv_wait_timeout(robj->tbo.base.resv, DMA_RESV_USAGE_READ, > > - true, timeout); > > + if (timeout == 0) > > + ret = dma_resv_test_signaled(robj->tbo.base.resv, > > + DMA_RESV_USAGE_READ); > > + else > > + ret = dma_resv_wait_timeout(robj->tbo.base.resv, > > + DMA_RESV_USAGE_READ, > > + true, timeout); > > > > /* ret == 0 means not signaled, > > * ret > 0 means signaled >