On Sat, Feb 23, 2013 at 05:30:10PM -0800, Ben Widawsky wrote: > On error, this represents the state of the currently running context at > the time it was loaded. > > Unfortunately, since we're hung and can't switch out the context this > may not tell us too much about the most current state of the context, > but does give clues about what has happened since loading. > > Thanks to recent doc updates, we have a little more confidence regarding > what is actually in this memory, and perhaps it will help us gain more > insight into certain bugs. AFAICT, the most interesting info is in the > first page. To save space, we only capture the first page. In the > future, we might want to dump more. > > Sample of the relevant part of error state: > --- HW Context = 0x01b20000 > 00000000 : 00000000 1100105f 00002028 ffff0880 > 00000010 : 0000209c feff4040 000020c0 efdf0080 > 00000020 : 00002178 00000001 0000217c 00145855 > 00000030 : 00002310 00000000 00002314 00000000 > 00000040 : 00002318 00000000 0000231c 00000000 Presentation looks reasonable, except it will confuse intel_error_decode as it will match "%x : %x". How about "[%03x] %08x %08x %08x %08x"? > > References: https://bugs.freedesktop.org/show_bug.cgi?id=55845 > Cc: Chris Wilson <chris at chris-wilson.co.uk> > Signed-off-by: Ben Widawsky <ben at bwidawsk.net> > --- > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index e95337c..ab88620 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -209,6 +209,7 @@ struct drm_i915_error_state { > u32 pgtbl_er; > u32 ier; > u32 ccid; > + struct drm_i915_error_object *ctx_obj; Put it next to the other pointers; lest we want to start digging holes. > u32 derrmr; > u32 forcewake; > bool waiting[I915_NUM_RINGS]; > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index ebaf558..7f7d241 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -1321,6 +1321,14 @@ static void i915_capture_error_state(struct drm_device *dev) > error->pgtbl_er = I915_READ(PGTBL_ER); > error->ccid = I915_READ(CCID); > > + if (error->ccid && !dev_priv->hw_contexts_disabled) { > + list_for_each_entry(obj, &dev_priv->mm.active_list, mm_list) I am doubtful that the active list will hold the object in all cases, as we only put the context obj onto the active list when switching away. I'd check the gtt_list to be on the safe side. And ignore what we think of hw_context_disabled - if the CCID randomly points to one of our objects, lets attach it. -Chris -- Chris Wilson, Intel Open Source Technology Centre