The prior assumption was that we did not need to reset the CSB on wedging when cancelling the outstanding requests as it would be cleaned up in the subsequent reset prior to restarting the GPU. However, what was not accounted for was that in performing the reset, we would try to process the outstanding CSB entries. If the GPU happened to complete a CS event just as we were performing the cancellation of requests, that event would be kept in the CSB until the reset -- but our bookkeeping was cleared, causing confusion when trying to complete the CS event. v2: Use a sanitize on unwedge to avoid interfering with eio suspend (where we intentionally disable GPU reset). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925 Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_gem.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d6f2bbd6a0dc..c8020719bcfb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3438,6 +3438,9 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915) i915_retire_requests(i915); GEM_BUG_ON(i915->gt.active_requests); + if (!intel_gpu_reset(i915, ALL_ENGINES)) + intel_engines_sanitize(i915); + /* * Undo nop_submit_request. We prevent all new i915 requests from * being queued (by disallowing execbuf whilst wedged) so having -- 2.19.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx