Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > As we declare the GPU wedged if the reset fails, such a failure is quite > terminal. Before taking that drastic action, let's sleep first and try > active, in the hope that the hardware has quietened down and is then > able to reset. After a few such attempts, it is fair to say that the HW > is truly wedged. > > v2: Always print the failure message now, we precheck whether resets are > disabled. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=104007 > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.c | 20 +++++++++++++++----- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index e0f053f9c186..7faf20aff25a 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -1877,7 +1877,9 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags) > { > struct i915_gpu_error *error = &i915->gpu_error; > int ret; > + int i; > > + might_sleep(); > lockdep_assert_held(&i915->drm.struct_mutex); > GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags)); > > @@ -1900,12 +1902,20 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags) > goto error; > } > > - ret = intel_gpu_reset(i915, ALL_ENGINES); > + if (!intel_has_gpu_reset(i915)) { > + DRM_DEBUG_DRIVER("GPU reset disabled\n"); > + goto error; > + } > + > + for (i = 0; i < 3; i++) { > + ret = intel_gpu_reset(i915, ALL_ENGINES); > + if (ret == 0) > + break; > + > + msleep(100); Seems reasonable to try few times and pause between defibrillate attempts instead of throwing dirt on top of coffin right off the bat. Also I have been pondering that should we add a minicheck to intel_gpu_reset to poke that the gpu is really there. Like doing few nops in (render)ringbuffer and see if head moves before declaring it as a reset success? Not that we would not see it in init right after but just to have more precise location of failure instead of initing a dead gpu. Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> -Mika > + } > if (ret) { > - if (ret != -ENODEV) > - DRM_ERROR("Failed to reset chip: %i\n", ret); > - else > - DRM_DEBUG_DRIVER("GPU reset disabled\n"); > + dev_err(i915->drm.dev, "Failed to reset chip\n"); > goto error; > } > > -- > 2.15.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx