On Wed, Jan 13, 2016 at 05:28:18PM +0000, Arun Siluvery wrote: > From: Tomas Elf <tomas.elf@xxxxxxxxx> > > There used to be a work queue separating the error handler from the hang > recovery path, which was removed a while back in this commit: > > commit b8d24a06568368076ebd5a858a011699a97bfa42 > Author: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Date: Wed Jan 28 17:03:14 2015 +0200 > > drm/i915: Remove nested work in gpu error handling > > Now we need to revert most of that commit since the work queue separating hang > detection from hang recovery is needed in preparation for the upcoming watchdog > timeout feature. The watchdog interrupt service routine will be a second > callsite of the error handler alongside the periodic hang checker, which runs > in a work queue context. Seeing as the error handler will be serving a caller > in a hard interrupt execution context that means that the error handler must > never end up in a situation where it needs to grab the struct_mutex. > Unfortunately, that is exactly what we need to do first at the start of the > hang recovery path, which might potentially sleep if the struct_mutex is > already held by another thread. Not good when you're in a hard interrupt > context. We also would not dream of running i915_handle_error() from inside an interrupt handler anyway as the capture is too heavy... > Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx> > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_dma.c | 1 + > drivers/gpu/drm/i915/i915_drv.h | 1 + > drivers/gpu/drm/i915/i915_irq.c | 31 ++++++++++++++++++++++++------- > 3 files changed, 26 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > index c45ec353..67003c2 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -1203,6 +1203,7 @@ int i915_driver_unload(struct drm_device *dev) > /* Free error state after interrupts are fully disabled. */ > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); > i915_destroy_error_state(dev); > + cancel_work_sync(&dev_priv->gpu_error.work); This should be before the destroy as we could be in the process of resetting state but after the cancel(hangcheck_work), as that may queue the error_work. > @@ -2827,7 +2830,21 @@ void i915_handle_error(struct drm_device *dev, u32 engine_mask, bool wedged, > i915_error_wake_up(dev_priv, false); > } > > - i915_reset_and_wakeup(dev); > + /* > + * Gen 7: > + * Gen 7? A little misleading -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx