On Tue, Dec 03, 2013 at 06:10:05PM +0100, Daniel Vetter wrote: > On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote: > > We used to lean on active_list to handle the references > > to batch objects. But there are useful cases when same, > > albeit simple, batch can be executing on multiple rings > > concurrently. For this case the active_list reference count > > handling is just not enough as batch could be freed by > > ring A request retirement as it is still running on ring B. > > > > Fix this by doing proper batch_obj reference counting. > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > > > Notes: > > This is a patch which ameliorates the > > [PATCH] tests/gem_reset_stats: add close-pending-fork > > > > Chris wasn't happy about the refcounting as it might hide > > the true problem. But I haven't been able to find the real culprit, > > thus the RFC. > > I think I understand the bug now, and your patch looks like the correct > fix. But we need to pimp the commit message. > > In i915_gem_reset_ring_lists we reset requests and move objects to the > inactive list. Which means if the active list is the last one to hold a > reference, the object will disappear. > > Now the problem is that we do this per-ring, and not in the order that the > objects would have been retired if the gpu wouldn't have hung. E.g. if a > batch is active on both ring 1&2 but was last active on ring 1, then we'd > free it before we go ahead with cleaning up the requests for ring 2. > > So reference-counting the batch_obj pointers looks like the real fix. No. The bug only exists in i915_gem_reset() and should not impact anywhere else. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx