Re: [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset

Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> · Mon, 13 Feb 2017 12:03:02 +0200

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:

> The signal threads may be running concurrently with the GPU reset. The
> completion from the GPU run asynchronous with the reset and two threads
> may see different snapshots of the state, and the signaler may mark a
> request as complete as we try to reset it. We don't tolerate 2 different
> views of the same state and complain if we try to mark a request as
> failed if it is already complete. Disable the signal threads during
> reset to prevent this conflict (even though the conflict implies that
> the state we resetting to is invalid, we have already made our
> decision!).
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>

> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_gem.c          | 16 +++++++++++++++-
>  drivers/gpu/drm/i915/intel_breadcrumbs.c |  3 +++
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3c2c2296c1dc..730804ce9610 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -35,6 +35,7 @@
>  #include "intel_frontbuffer.h"
>  #include "intel_mocs.h"
>  #include <linux/dma-fence-array.h>
> +#include <linux/kthread.h>
>  #include <linux/reservation.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/slab.h>
> @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>  	for_each_engine(engine, dev_priv, id) {
>  		struct drm_i915_gem_request *request;
>  
> +		/* Prevent the signaler thread from updating the request
> +		 * state (by calling dma_fence_signal) as we are processing
> +		 * the reset. The write from the GPU of the seqno is
> +		 * asynchronous and the signaler thread may see a different
> +		 * value to us and declare the request complete, even though
> +		 * the reset routine have picked that request as the active
> +		 * (incomplete) request. This conflict is not handled
> +		 * gracefully!
> +		 */
> +		kthread_park(engine->breadcrumbs.signaler);
> +
>  		/* Prevent request submission to the hardware until we have
>  		 * completed the reset in i915_gem_reset_finish(). If a request
>  		 * is completed by one engine, it may then queue a request
> @@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
>  
>  	lockdep_assert_held(&dev_priv->drm.struct_mutex);
>  
> -	for_each_engine(engine, dev_priv, id)
> +	for_each_engine(engine, dev_priv, id) {
>  		tasklet_enable(&engine->irq_tasklet);
> +		kthread_unpark(engine->breadcrumbs.signaler);
> +	}
>  }
>  
>  static void nop_submit_request(struct drm_i915_gem_request *request)
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 9fd002bcebb6..f5e05343110a 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg)
>  				break;
>  
>  			schedule();
> +
> +			if (kthread_should_park())
> +				kthread_parkme();
>  		}
>  	} while (1);
>  	__set_current_state(TASK_RUNNING);
> -- 
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx