On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote: > > On 17/11/15 17:08, Daniel Vetter wrote: > >On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote: > >> > >>On 17/11/15 16:39, Daniel Vetter wrote: > >>>On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote: > >>>>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>> > >>>>In the following commit: > >>>> > >>>> commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae > >>>> Author: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>> Date: Mon Oct 5 13:26:36 2015 +0100 > >>>> > >>>> drm/i915: Clean up associated VMAs on context destruction > >>>> > >>>>I added a WARN_ON assertion that VM's active list must be empty > >>>>at the time of owning context is getting freed, but that turned > >>>>out to be a wrong assumption. > >>>> > >>>>Due ordering of operations in i915_gem_object_retire__read, where > >>>>contexts are unreferenced before VMAs are moved to the inactive > >>>>list, the described situation can in fact happen. > >>>> > >>>>It feels wrong to do things in such order so this fix makes sure > >>>>a reference to context is held until the move to inactive list > >>>>is completed. > >>>> > >>>>v2: Rather than hold a temporary context reference move the > >>>> request unreference to be the last operation. (Daniel Vetter) > >>>> > >>>>v3: Fix use after free. (Chris Wilson) > >>>> > >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638 > >>>>Cc: Michel Thierry <michel.thierry@xxxxxxxxx> > >>>>Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > >>>>--- > >>>> drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++--------------- > >>>> 1 file changed, 18 insertions(+), 15 deletions(-) > >>>> > >>>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>>>index 98c83286ab68..094ac17a712d 100644 > >>>>--- a/drivers/gpu/drm/i915/i915_gem.c > >>>>+++ b/drivers/gpu/drm/i915/i915_gem.c > >>>>@@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring) > >>>> RQ_BUG_ON(!(obj->active & (1 << ring))); > >>>> > >>>> list_del_init(&obj->ring_list[ring]); > >>>>- i915_gem_request_assign(&obj->last_read_req[ring], NULL); > >>>> > >>>> if (obj->last_write_req && obj->last_write_req->ring->id == ring) > >>>> i915_gem_object_retire__write(obj); > >>>> > >>>> obj->active &= ~(1 << ring); > >>>>- if (obj->active) > >>>>- return; > >>> > >>> if (obj->active) { > >>> i915_gem_request_assign(&obj->last_read_req[ring], NULL); > >>> return; > >>> } > >>> > >>>Would result in less churn in the code and drop the unecessary indent > >>>level. Also comment is missing as to why we need to do things in a > >>>specific order. > >> > >>Actually I think I changed my mind and that v1 is the way to go. > >> > >>Just re-ordering the code here still makes it possible for the context > >>destructor to run with VMAs on the active list I think. > >> > >>If we hold the context then it is 100% clear it is not possible. > > > >request_assign _is_ the function which adjust the refcounts for us, which > >means if we drop that reference too early then grabbing a temp reference > >is just papering over the real bug. > > > >Written out your patch looks something like > > > > a_reference(a); > > a_unreference(a); > > > > /* more cleanup code that should get run before a_unreference but isn't */ > > > > a_unrefernce(a); /* for real this time */ > > > >Unfortunately foo_assign is a new pattern and not well-established, so > >that connection isn't clear. Maybe we should rename it to > >foo_reference_assign to make it more obvious. Or just drop the pretense > >and open-code it since we unconditionally assign NULL as the new pointer > >value, and we know the current value of the pointer is non-NULL. So > >there's really no benefit to the helper here, it only obfuscates. And > >since that obfuscation tripped you up it's time to remove it ;-) > > Then foo_reference_unreference_assign. :) > > But seriously, I think it is more complicated that.. > > The thing it trips over is that moving VMAs to inactive does not correspond > in time to request retirement. But in fact VMAs are moved to inactive only > when all requests associated with an object are done. > > This is the unintuitive thing I was working around. To make sure when > context destructor runs there are not active VMAs for that VM. > > I don't know how to guarantee that with what you propose. Perhaps I am > missing something? Ok, my example was slightly off, since we have 2 objects: b_reference(a->b); a_unreference(a); /* might unref a->b if it's the last reference */ /* more cleanup code that should get run before a_unreference but isn't */ b_unrefernce(a->b); /* for real this time */ Holding the ref to a makes sure that b doesn't disappear. We rely on that in a fundamental way (a request really needs the ctx to stick around), and the bug really is that we drop the ref to a too early. That it's the releasing of a->b which is eventually blowing things up doesn't really matter. Btw would it be possible to have an igt for this? I should be possible to hit this with some varian of gem_unref_active_buffers. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx