Quoting Michał Winiarski (2018-03-10 11:07:03) > [ 59.708020] [drm:error_state_write [i915]] Resetting error state > [ 59.708508] [IGT] gem_exec_capture: starting subtest capture-vebox > [ 59.718849] [drm] GPU HANG: ecode 9:0:0xfff7fffe, reason: Manually set > wedged engine mask = ffffffffffffffff, action: reset > [ 59.719421] i915 0000:00:02.0: Resetting vecs0 after gpu hang > [ 59.720276] [drm:i915_gem_reset_engine [i915]] resetting vecs0 to restart > from tail of request 0x1 > [ 59.721008] [drm:i915_reset_device [i915]] resetting chip > [ 59.721226] i915 0000:00:02.0: Resetting chip after gpu hang > [ 59.721575] i915 0000:00:02.0: GPU recovery failed Full device reset doesn't handle being called from a failed per-engine reset. Whoops. It doesn't look there's any reason for it to have failed per-engine reset either, diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 828f3104488c..44eef355e12c 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv, */ intel_runtime_pm_get(dev_priv); + engine_mask &= INTEL_INFO(dev_priv)->ring_mask; i915_capture_error_state(dev_priv, engine_mask, error_msg); i915_clear_error_registers(dev_priv); should fix the immediate problem; but there's no reason afaict for this to vary between test runs. As to how to properly ignore left-over state from per-engine reset when doing the full-reset fallback... ugh. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx