Quoting Daniel Vetter (2017-12-06 14:43:39) > On Wed, Dec 06, 2017 at 02:19:03PM +0000, Chris Wilson wrote: > > Since capturing the error state requires fiddling around with the GGTT > > to read arbitrary buffers and is itself run under stop_machine(), it > > deadlocks the machine (effectively a hard hang) when run in conjunction > > with Broxton's VTd workaround to serialize GGTT access. > > > > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Jon Bloomfield <jon.bloomfield@xxxxxxxxx> > > Cc: John Harrison <john.C.Harrison@xxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c > > index 48418fb81066..e6c7e8e53815 100644 > > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > > @@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv, > > if (!i915_modparams.error_capture) > > return; > > > > + /* Prevent recursively calling stop_machine() and deadlocking. */ > > + if (intel_ggtt_update_needs_vtd_wa(dev_priv)) > > + return; > > I'd put this closer to the stop machine, at the head of > i915_capture_gpu_state(). If the bogus debug output annoys then we could > switch that to an PTR_ERR return value I guess. But I guess this here is > ok too, so either way: I was considering doing some of the capture, skipping the buffers, but nowadays those buffers tend to the crux of triaging. My only real concern is how to explain to the user that the error state cannot exist, for which we could go and add -ENODEV to sysfs/debugfs just to be clear. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx