RE: [PATCH 2/3] Revert "drm/i915: Propagate errors on awaiting already signaled fences"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Acked-by: Jon Bloomfield <jon.bloomfield@xxxxxxxxx>

> -----Original Message-----
> From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
> Sent: Thursday, May 5, 2022 2:22 AM
> To: gfx-internal-devel@xxxxxxxxxxxxxxxxx
> Cc: Teres Alexis, Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>; Harrison,
> John C <john.c.harrison@xxxxxxxxx>; Jason Ekstrand
> <jason@xxxxxxxxxxxxxx>; Slusarz, Marcin <marcin.slusarz@xxxxxxxxx>;
> stable@xxxxxxxxxxxxxxx; Jason Ekstrand <jason.ekstrand@xxxxxxxxx>; Daniel
> Vetter <daniel.vetter@xxxxxxxx>; Bloomfield, Jon
> <jon.bloomfield@xxxxxxxxx>
> Subject: [PATCH 2/3] Revert "drm/i915: Propagate errors on awaiting already
> signaled fences"
> 
> From: Jason Ekstrand <jason@xxxxxxxxxxxxxx>
> 
> This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7.  Ever
> since that commit, we've been having issues where a hang in one client
> can propagate to another.  In particular, a hang in an app can propagate
> to the X server which causes the whole desktop to lock up.
> 
> Error propagation along fences sound like a good idea, but as your bug
> shows, surprising consequences, since propagating errors across security
> boundaries is not a good thing.
> 
> What we do have is track the hangs on the ctx, and report information to
> userspace using RESET_STATS. That's how arb_robustness works. Also, if my
> understanding is still correct, the EIO from execbuf is when your context
> is banned (because not recoverable or too many hangs). And in all these
> cases it's up to userspace to figure out what is all impacted and should
> be reported to the application, that's not on the kernel to guess and
> automatically propagate.
> 
> What's more, we're also building more features on top of ctx error
> reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
> userspace fence wait also relies on that mechanism. So it is the path
> going forward for reporting gpu hangs and resets to userspace.
> 
> So all together that's why I think we should just bury this idea again as
> not quite the direction we want to go to, hence why I think the revert is
> the right option here.
> 
> For backporters: Please note that you _must_ have a backport of
> https://lore.kernel.org/dri-devel/20210602164149.391653-2-
> jason@xxxxxxxxxxxxxx/
> for otherwise backporting just this patch opens up a security bug.
> 
> v2: Augment commit message. Also restore Jason's sob that I
> accidentally lost.
> 
> v3: Add a note for backporters
> 
> Signed-off-by: Jason Ekstrand <jason@xxxxxxxxxxxxxx>
> Reported-by: Marcin Slusarz <marcin.slusarz@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # v5.6+
> Cc: Jason Ekstrand <jason.ekstrand@xxxxxxxxx>
> Cc: Marcin Slusarz <marcin.slusarz@xxxxxxxxx>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already
> signaled fences")
> Acked-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Reviewed-by: Jon Bloomfield <jon.bloomfield@xxxxxxxxx>
> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Link:
> https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-
> 3-jason@xxxxxxxxxxxxxx
> (cherry picked from commit 93a2711cddd5760e2f0f901817d71c93183c3b87)
> ---
>  drivers/gpu/drm/i915/i915_request.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c
> b/drivers/gpu/drm/i915/i915_request.c
> index 8bd484e2a0ec..08484c14d11e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1426,10 +1426,8 @@ i915_request_await_execution(struct
> i915_request *rq,
> 
>  	do {
>  		fence = *child++;
> -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> >flags)) {
> -			i915_sw_fence_set_error_once(&rq->submit, fence-
> >error);
> +		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> >flags))
>  			continue;
> -		}
> 
>  		if (fence->context == rq->fence.context)
>  			continue;
> @@ -1527,10 +1525,8 @@ i915_request_await_dma_fence(struct
> i915_request *rq, struct dma_fence *fence)
> 
>  	do {
>  		fence = *child++;
> -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> >flags)) {
> -			i915_sw_fence_set_error_once(&rq->submit, fence-
> >error);
> +		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
> >flags))
>  			continue;
> -		}
> 
>  		/*
>  		 * Requests on the same timeline are explicitly ordered,
> along




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux