[PATCH 3/5] drm/i915: don't return a spurious -EIO from intel_ring_begin

daniel at ffwll.ch (Daniel Vetter) · Tue, 26 Jun 2012 01:05:24 +0200

On Tue, Jun 26, 2012 at 12:52 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Mon, 25 Jun 2012 23:48:01 +0200, Daniel Vetter <daniel at ffwll.ch> wrote:
>> So essentially I still fail to see the upside of your proposed ductape
>> ... In either case I guess a walk to the reset button is inevitable
>> every once in a while ;-)
>
> A false positive for declaring a GPU wedged in a situation that should
> have never occurred in the first place is a recoverable and minor
> inconvenience compared to locking the display and possibly the system up.
>
> An alternative is to incorporate the deadlock detection into
> i915_mutex_lock_interruptible() and make it report -EIO if it waits
> longer than 10s, f.e., for the reset to complete. Then the only danger
> are the few paths that do not perform the error checking lock.

I kinda like this idea - all unconditional mutex_lockers would
deadlock in the same way as i915_reset, but if we've managed to
sprinkle our special reset aware trylock code at all the right places,
at least userspace should get to the -EIO eventually and do something
sensible. I guess if someone is indeed hogging dev->struct_mutex
somehow (which /should/ be the only thing preventing i915_reset from
doing its job) there's not much userspace could actually do - it would
inevitably die on the next gtt pagefault. But I guess we can etch out
a bit more survivability in corner cases.

I'll see what this looks like in actual code tomorrow.

Thanks, Daniel
-- 
Daniel Vetter
daniel.vetter at ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch