Re: [PATCH 1/2] drm/i915: Convert hangcheck from a timer into a delayed work item

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 23, 2015 at 02:44:07PM +0200, Mika Kuoppala wrote:
> From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> 
> When run as a timer, i915_hangcheck_elapsed() must adhere to all the
> rules of running in a softirq context. This is advantageous to us as we
> want to minimise the risk that a driver bug will prevent us from
> detecting a hung GPU. However, that is irrelevant if the driver bug
> prevents us from resetting and recovering. Still it is prudent not to
> rely on mutexes inside the checker, but given the coarseness of
> dev->struct_mutex doing so is extremely hard.
> 
> Give in and run from a work queue, i.e. outside of softirq.
> 
> v2:
> 
> The conversion does have one significant change, from the use of
> mod_timer to schedule_delayed_work, means that the time that we execute
> the first hangcheck is fixed and not continually deferred by later work.
> This has the advantage of not allowing userspace to fill the ring before
> hangcheck can finally run. At the same time, it removes the ability for
> the interrupt to defer the hangcheck as well. This is sensible for that
> an interrupt is only for a single engine, whereas we perform hangcheck
> globally, so whilst one ring may have hung, the other could be running
> normally and preventing the hangcheck from firing.
> 
> Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
> Cc: Daniel Vetter <dnaiel.vetter@xxxxxxxxx>
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> (v2)
> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>

One thing special with timers is that they'll always run, and if you do a
del_timer_sync from process context you can't deadlock with timer A
because you hold some locks for timer B that's in front of A in the
queues. With workqueues that's not the case, and it's really easy to cause
deadlocks by blocking some random work item in front of the queue by
accident.

I think for this switch we need our own, dedicated hangcheck work queue,
with it's own thread to make sure it gets run reliable. But besides that I
really like this chnage.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux