On Mon, Nov 02, 2015 at 11:25:08AM +0200, Mika Kuoppala wrote: > Gen9 has had demonstrated cases where forcing a not ready gpu > into reset has caused system hang [1]. > > Gen8 has never to this date demonstrated such behaviour. > > In our CI tests there have been two cases of bsw ending in a state > where it claims it is not ready for reset, based on reset request, > after gpu hang [2]. Both of these cases have happened with kernel > where there are lots of debugs enabled. So it is possible that > something timing related is at play here like that wait_for_register() > usleep wakeups collide badly with forcewake. > > If we assume that gen8 is safe to reset even with claims of nonreadiness, > we can alleviate this situations by forcing a reset and revive the gpu. > Enhance logging so that it will be clear what conditions led to decision > of proceeding or bailing out, so that we will spot if this way of forcing > our will against gpu turns out to be foolhardy. > > v2: - add bugzilla entry for bsw behaviour (Chris) > - improve commit message > > References [1]: https://bugs.freedesktop.org/show_bug.cgi?id=89959 > References [2]: https://bugs.freedesktop.org/show_bug.cgi?id=92774 > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Tomi Sarvela <tomix.p.sarvela@xxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > --- > drivers/gpu/drm/i915/intel_uncore.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index f0f97b2..47c17f2 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -1504,7 +1504,14 @@ not_ready: > I915_WRITE(RING_RESET_CTL(engine->mmio_base), > _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET)); > > - return -EIO; > + if (INTEL_INFO(dev)->gen == 9) { > + DRM_ERROR("Reset would risk system stability, bailing out\n"); > + return -EIO; > + } > + > + DRM_ERROR("Forcing non ready gpu into reset\n"); DRM_ERROR will cause dmes-warn failures when running igt. If we agree that this is ok, it should be at most DRM_INFO. I'd got with DRM_DEBUG_DRIVER. Ack with that change. -Daniel > + > + return gen6_do_reset(dev); > } > > static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) > -- > 2.5.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx