On Wed, Dec 04, 2013 at 01:18:42PM +0100, Daniel Vetter wrote: > On Wed, Dec 04, 2013 at 11:37:09AM +0000, Chris Wilson wrote: > > As the rings may be processed and their requests deallocated in a > > different order to the natural retirement during a reset, > > > > /* Whilst this request exists, batch_obj will be on the > > * active_list, and so will hold the active reference. Only when this > > * request is retired will the the batch_obj be moved onto the > > * inactive_list and lose its active reference. Hence we do not need > > * to explicitly hold another reference here. > > */ > > > > is violated, and the batch_obj may be dereferenced after it had been > > freed on another ring. This can be simply avoided by processing the > > status update prior to deallocating any requests. > > > > Fixes regression (a possible OOPS following a GPU hang) from > > commit aa60c664e6df502578454621c3a9b1f087ff8d25 > > Author: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Date: Wed Jun 12 15:13:20 2013 +0300 > > > > drm/i915: find guilty batch buffer on ring resets > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > --- > > drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++++++---------- > > 1 file changed, 19 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index ec4502034203..c1e481d36575 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -2442,15 +2442,24 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request) > > kfree(request); > > } > > > > -static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv, > > - struct intel_ring_buffer *ring) > > +static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv, > > + struct intel_ring_buffer *ring) > > { > > - u32 completed_seqno; > > - u32 acthd; > > + u32 completed_seqno = ring->get_seqno(ring, false); > > + u32 acthd = intel_ring_get_active_head(ring); > > + struct drm_i915_gem_request *request; > > + > > + list_for_each_entry(request, &ring->request_list, list) { > > + if (i915_seqno_passed(completed_seqno, request->seqno)) > > + continue; > > > > - acthd = intel_ring_get_active_head(ring); > > - completed_seqno = ring->get_seqno(ring, false); > > + i915_set_reset_status(ring, request, acthd); > > + } > > +} > > Indeed the fix in the gem reset code is a bit simpler than what I've > feared. We still have fairly tricky code which depends upon that implicit > reference in non-obvious ways. So I still think Mika's refcount patch with > the comments updated is the better approach. That batch_obj only exists for GPU hang accounting. It seems pointless to make everything else more complicated. If you really wanted to simplify it, you could store the actual batch offset+length rather than a pointer, which would be even safer. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html