Re: [PATCH 06/20] drm/i915: Reinstate hang recovery work queue.

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 13 Jan 2016 21:01:29 +0000

On Wed, Jan 13, 2016 at 05:28:18PM +0000, Arun Siluvery wrote:
> From: Tomas Elf <tomas.elf@xxxxxxxxx>
> 
> There used to be a work queue separating the error handler from the hang
> recovery path, which was removed a while back in this commit:
> 
> 	commit b8d24a06568368076ebd5a858a011699a97bfa42
> 	Author: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> 	Date:   Wed Jan 28 17:03:14 2015 +0200
> 
> 	    drm/i915: Remove nested work in gpu error handling
> 
> Now we need to revert most of that commit since the work queue separating hang
> detection from hang recovery is needed in preparation for the upcoming watchdog
> timeout feature. The watchdog interrupt service routine will be a second
> callsite of the error handler alongside the periodic hang checker, which runs
> in a work queue context. Seeing as the error handler will be serving a caller
> in a hard interrupt execution context that means that the error handler must
> never end up in a situation where it needs to grab the struct_mutex.
> Unfortunately, that is exactly what we need to do first at the start of the
> hang recovery path, which might potentially sleep if the struct_mutex is
> already held by another thread. Not good when you're in a hard interrupt
> context.

We also would not dream of running i915_handle_error() from inside an
interrupt handler anyway as the capture is too heavy...

> Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_dma.c |  1 +
>  drivers/gpu/drm/i915/i915_drv.h |  1 +
>  drivers/gpu/drm/i915/i915_irq.c | 31 ++++++++++++++++++++++++-------
>  3 files changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index c45ec353..67003c2 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1203,6 +1203,7 @@ int i915_driver_unload(struct drm_device *dev)
>  	/* Free error state after interrupts are fully disabled. */
>  	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>  	i915_destroy_error_state(dev);
> +	cancel_work_sync(&dev_priv->gpu_error.work);

This should be before the destroy as we could be in the process of
resetting state but after the cancel(hangcheck_work), as that may queue
the error_work.

> @@ -2827,7 +2830,21 @@ void i915_handle_error(struct drm_device *dev, u32 engine_mask, bool wedged,
>  		i915_error_wake_up(dev_priv, false);
>  	}
>  
> -	i915_reset_and_wakeup(dev);
> +	/*
> +	 * Gen 7:
> +	 *

Gen 7? A little misleading
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx