On Wed, Dec 06, 2017 at 02:48:36PM +0000, Chris Wilson wrote: > Quoting Daniel Vetter (2017-12-06 14:43:39) > > On Wed, Dec 06, 2017 at 02:19:03PM +0000, Chris Wilson wrote: > > > Since capturing the error state requires fiddling around with the GGTT > > > to read arbitrary buffers and is itself run under stop_machine(), it > > > deadlocks the machine (effectively a hard hang) when run in conjunction > > > with Broxton's VTd workaround to serialize GGTT access. > > > > > > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Cc: Jon Bloomfield <jon.bloomfield@xxxxxxxxx> > > > Cc: John Harrison <john.C.Harrison@xxxxxxxxx> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > > --- > > > drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c > > > index 48418fb81066..e6c7e8e53815 100644 > > > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > > > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > > > @@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv, > > > if (!i915_modparams.error_capture) > > > return; > > > > > > + /* Prevent recursively calling stop_machine() and deadlocking. */ > > > + if (intel_ggtt_update_needs_vtd_wa(dev_priv)) > > > + return; > > > > I'd put this closer to the stop machine, at the head of > > i915_capture_gpu_state(). If the bogus debug output annoys then we could > > switch that to an PTR_ERR return value I guess. But I guess this here is > > ok too, so either way: > > I was considering doing some of the capture, skipping the buffers, but > nowadays those buffers tend to the crux of triaging. My only real concern > is how to explain to the user that the error state cannot exist, for > which we could go and add -ENODEV to sysfs/debugfs just to be clear. Fancy idea: store ther PTR_ERR in ->first.error and return that? Would address both my bikeshed and your suggestion. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx