On Fri, Jan 23, 2015 at 02:44:07PM +0200, Mika Kuoppala wrote: > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > When run as a timer, i915_hangcheck_elapsed() must adhere to all the > rules of running in a softirq context. This is advantageous to us as we > want to minimise the risk that a driver bug will prevent us from > detecting a hung GPU. However, that is irrelevant if the driver bug > prevents us from resetting and recovering. Still it is prudent not to > rely on mutexes inside the checker, but given the coarseness of > dev->struct_mutex doing so is extremely hard. > > Give in and run from a work queue, i.e. outside of softirq. > > v2: > > The conversion does have one significant change, from the use of > mod_timer to schedule_delayed_work, means that the time that we execute > the first hangcheck is fixed and not continually deferred by later work. > This has the advantage of not allowing userspace to fill the ring before > hangcheck can finally run. At the same time, it removes the ability for > the interrupt to defer the hangcheck as well. This is sensible for that > an interrupt is only for a single engine, whereas we perform hangcheck > globally, so whilst one ring may have hung, the other could be running > normally and preventing the hangcheck from firing. We can drop this comment since we have already applied this change in an earlier patch. > @@ -3097,17 +3099,11 @@ static void i915_hangcheck_elapsed(unsigned long data) > > void i915_queue_hangcheck(struct drm_device *dev) > { > - struct drm_i915_private *dev_priv = dev->dev_private; > - struct timer_list *timer = &dev_priv->gpu_error.hangcheck_timer; > - > if (!i915.enable_hangcheck) > return; > > - /* Don't continually defer the hangcheck, but make sure it is active */ Keep the comment, or more preferrably let's capture the reason why we don't want to continually defer the hangcheck: /* Don't continually defer the hangcheck so that it is always run at * least once after work has been scheduled on any ring. Otherwise, * we will ignore a hung ring if a second ring is kept busy. */ > - if (timer_pending(timer)) > - return; > - mod_timer(timer, > - round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES)); > + schedule_delayed_work(&to_i915(dev)->gpu_error.hangcheck_work, > + round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES)); > } -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx