Re: [PATCH 3/9] drm/i915: Prevent using semaphores to chain up to external fences

Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> · Fri, 08 May 2020 18:37:15 +0300

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:

> The downside of using semaphores is that we lose metadata passing
> along the signaling chain. This is particularly nasty when we
> need to pass along a fatal error such as EFAULT or EDEADLK. For
> fatal errors we want to scrub the request before it is executed,
> which means that we cannot preload the request onto HW and have
> it wait upon a semaphore.

b is waiting on a, a fails and we want to release b with error?

>
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_request.c         | 26 +++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler_types.h |  1 +
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 94189c7d43cd..f0f9393e2ade 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1002,6 +1002,15 @@ emit_semaphore_wait(struct i915_request *to,
>  	if (!rcu_access_pointer(from->hwsp_cacheline))
>  		goto await_fence;
>  
> +	/*
> +	 * If this or its dependents are waiting on an external fence
> +	 * that may fail catastrophically, then we want to avoid using
> +	 * sempahores as they bypass the fence signaling metadata, and we

semaphore
-Mika

> +	 * lose the fence->error propagation.
> +	 */
> +	if (from->sched.flags & I915_SCHED_HAS_EXTERNAL_CHAIN)
> +		goto await_fence;
> +
>  	/* Just emit the first semaphore we see as request space is limited. */
>  	if (already_busywaiting(to) & mask)
>  		goto await_fence;
> @@ -1064,12 +1073,29 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>  			return ret;
>  	}
>  
> +	if (from->sched.flags & I915_SCHED_HAS_EXTERNAL_CHAIN)
> +		to->sched.flags |= I915_SCHED_HAS_EXTERNAL_CHAIN;
> +
>  	return 0;
>  }
>  
> +static void mark_external(struct i915_request *rq)
> +{
> +	/*
> +	 * The downside of using semaphores is that we lose metadata passing
> +	 * along the signaling chain. This is particularly nasty when we
> +	 * need to pass along a fatal error such as EFAULT or EDEADLK. For
> +	 * fatal errors we want to scrub the request before it is executed,
> +	 * which means that we cannot preload the request onto HW and have
> +	 * it wait upon a semaphore.
> +	 */
> +	rq->sched.flags |= I915_SCHED_HAS_EXTERNAL_CHAIN;
> +}
> +
>  static int
>  i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
>  {
> +	mark_external(rq);
>  	return i915_sw_fence_await_dma_fence(&rq->submit, fence,
>  					     fence->context ? I915_FENCE_TIMEOUT : 0,
>  					     I915_FENCE_GFP);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index 7186875088a0..6ab2c5289bed 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -66,6 +66,7 @@ struct i915_sched_node {
>  	struct i915_sched_attr attr;
>  	unsigned int flags;
>  #define I915_SCHED_HAS_SEMAPHORE_CHAIN	BIT(0)
> +#define I915_SCHED_HAS_EXTERNAL_CHAIN	BIT(1)
>  	intel_engine_mask_t semaphores;
>  };
>  
> -- 
> 2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx