On Wed, Nov 25, 2015 at 06:45:26PM +0100, Daniel Vetter wrote: > My apologies to Chris Wilson for being dense on this topic for > essentially for years. > > This patch doesn't do any big redesign of the overall reset flow, but > instead just applies changes where it's needed to obey gem_eio. We can > redesign later on, but for now I prefer to make the testcase happy > with some minimally invasive changes: > > - Ensure that set_domain_ioctl never returns -EIO. Tricky part there > is that we might race with the reset handler when doing the lockless > wait. > > - Make sure debugfs/i915_wedged is actually useful to reset a wedged > gpu - atm it bails out with -EAGAIN for a terminally wedged gpu. > That's because reset_in_progress actually includes that terminal > state, which is super-confusing (and most callers got this wrong). > Fix this by changing the semantics of reset_in_progress to do what > it says on the tin (and fixup all other callers). > > Also make sure that reset_in_progress is cleared when we reach the > terminal "wedged" state. Without this we kept returning -EAGAIN in > some places forever. > > - Second problem with debugfs/i915_wedge was that we never got out of > the terminal wedged state. Hence manually set the reset counter out > of that state before starting the hung gpu recovery. > > For safety also make sure that we are consintent with resetting the > gpu between i915_reset_and_wakeup and i915_handler_error by just > passing the boolean "wedged" around instead of trying to recompute > it wrongly. This isn't an issue for gem_eio with the debugfs fix, > but just general robustness of the code. > > - Finally make sure that when gpu reset is disabled through the module > paramter the kernel doesn't generate dmesg noise that would upset > our CI/piglit. Simplest solution for that was to lift the i915.reset > check up into i915_reset(). > > With all these changes, plus the igt fixes I've posted, gem_eio is now > happy on many snb. > > v2: Don't upset lockdep with my set_domain_ioctl changes. > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Testcase: igt/gem_eio/* > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++ > drivers/gpu/drm/i915/i915_drv.c | 6 ++++++ > drivers/gpu/drm/i915/i915_drv.h | 2 +- > drivers/gpu/drm/i915/i915_gem.c | 23 +++++++++++++++-------- > drivers/gpu/drm/i915/i915_irq.c | 11 ++++++----- > drivers/gpu/drm/i915/intel_uncore.c | 3 --- > 6 files changed, 32 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index 411a9c68b4ee..c4006c09ef1b 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -4744,6 +4744,10 @@ i915_wedged_set(void *data, u64 val) > if (i915_reset_in_progress(&dev_priv->gpu_error)) > return -EAGAIN; > > + /* Already wedged, force us out of this terminal state. */ > + if (i915_terminally_wedged(&dev_priv->gpu_error)) > + atomic_set(&dev_priv->gpu_error.reset_counter, 0); What? > + > intel_runtime_pm_get(dev_priv); > > i915_handle_error(dev, val, > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index ec1e877c4a36..1f5386bb78f4 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -909,6 +909,12 @@ int i915_reset(struct drm_device *dev) > > simulated = dev_priv->gpu_error.stop_rings != 0; > > + if (!i915.reset) { > + DRM_INFO("GPU reset disabled in module option, not resetting\n"); > + mutex_unlock(&dev->struct_mutex); > + return -ENODEV; > + } > + > ret = intel_gpu_reset(dev); Please don't break intel_gpu_reset() like that. I think this patch missed the point entirely. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx