On Thu, 3 Jul 2014 08:09:01 +0100 Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > Since we rely on hangcheck to wait up and kick us out of an indefinite > wait should the GPU ever stop functioning, it appears sensible that we > should check that hangcheck is indeed active before starting that wait. > This just prevents a driver error in the processing of hangcheck from > appearing to hang the machine. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.h | 1 + > drivers/gpu/drm/i915/i915_gem.c | 2 ++ > drivers/gpu/drm/i915/i915_irq.c | 11 +++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ > 4 files changed, 16 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 8670d05..c326a99 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2149,6 +2149,7 @@ extern void intel_console_resume(struct work_struct *work); > > /* i915_irq.c */ > void i915_queue_hangcheck(struct drm_device *dev); > +void i915_check_hangcheck(struct drm_device *dev); > __printf(3, 4) > void i915_handle_error(struct drm_device *dev, bool wedged, > const char *fmt, ...); > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index ecb835b..120ed6d 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1455,6 +1455,8 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno, > break; > } > > + i915_check_hangcheck(ring->dev); > + > timer.function = NULL; > if (timeout || missed_irq(dev_priv, ring)) { > unsigned long expire; > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 9e5a295..daa590e 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -3063,6 +3063,17 @@ void i915_queue_hangcheck(struct drm_device *dev) > round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES)); > } > > +void i915_check_hangcheck(struct drm_device *dev) > +{ > + struct drm_i915_private *dev_priv = dev->dev_private; > + if (!i915.enable_hangcheck) > + return; > + > + if (!timer_pending(&dev_priv->gpu_error.hangcheck_timer)) > + mod_timer(&dev_priv->gpu_error.hangcheck_timer, > + round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES)); > +} > + > static void ibx_irq_reset(struct drm_device *dev) > { > struct drm_i915_private *dev_priv = dev->dev_private; > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 4f3397f..6365d4d 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1634,6 +1634,8 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n) > master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT; > } > > + i915_check_hangcheck(dev); > + > io_schedule_timeout(1); > > if (dev_priv->mm.interruptible && signal_pending(current)) { Are there any bugs associated with this? i915_rearm_hangcheck() or something might more accurately describe what's going on here. I suppose both of these paths are protected by the struct_mutex? If not, might we race and mod_timer() this twice from two threads in succession? I guess that's harmless... -- Jesse Barnes, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx