Simply failing to reset the gpu because someone else might still hold the mutex isn't a great idea - I see reliable silent reset failures. And gpu reset simply needs to be reliable and Just Work. "But ... the deadlocks!" We already kick all processes waiting for the gpu before launching the reset work item. New waiters need to check the wedging state anyway and then bail out. If we have places that can deadlock, we simply need to fix them. "But ... testing!" We have the gpu hangman, and if the current gpu load gem_exec_nop isn't good enough to hit a specific case, we can add a new one. "But ... don't we return -EAGAIN for non-interruptible calls to wait_seqno now?" Yep, but this problem already exists in the current code. A follow up patch will remedy this by returning -EIO for non-interruptible sleeps if the gpu died and the low-level wait bails out with -EAGAIN. Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch> --- drivers/gpu/drm/i915/i915_drv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 79be879..0e114f7 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -866,8 +866,7 @@ int i915_reset(struct drm_device *dev) if (!i915_try_reset) return 0; - if (!mutex_trylock(&dev->struct_mutex)) - return -EBUSY; + mutex_lock(&dev->struct_mutex); i915_gem_reset(dev); -- 1.7.10