On Wed, Dec 04, 2013 at 01:08:56PM +0100, Daniel Vetter wrote: > On Wed, Dec 4, 2013 at 12:24 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Dec 03, 2013 at 06:10:05PM +0100, Daniel Vetter wrote: > >> On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote: > >> > We used to lean on active_list to handle the references > >> > to batch objects. But there are useful cases when same, > >> > albeit simple, batch can be executing on multiple rings > >> > concurrently. For this case the active_list reference count > >> > handling is just not enough as batch could be freed by > >> > ring A request retirement as it is still running on ring B. > >> > > >> > Fix this by doing proper batch_obj reference counting. > >> > > >> > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > >> > > >> > Notes: > >> > This is a patch which ameliorates the > >> > [PATCH] tests/gem_reset_stats: add close-pending-fork > >> > > >> > Chris wasn't happy about the refcounting as it might hide > >> > the true problem. But I haven't been able to find the real culprit, > >> > thus the RFC. > >> > >> I think I understand the bug now, and your patch looks like the correct > >> fix. But we need to pimp the commit message. > >> > >> In i915_gem_reset_ring_lists we reset requests and move objects to the > >> inactive list. Which means if the active list is the last one to hold a > >> reference, the object will disappear. > >> > >> Now the problem is that we do this per-ring, and not in the order that the > >> objects would have been retired if the gpu wouldn't have hung. E.g. if a > >> batch is active on both ring 1&2 but was last active on ring 1, then we'd > >> free it before we go ahead with cleaning up the requests for ring 2. > >> > >> So reference-counting the batch_obj pointers looks like the real fix. > > > > No. The bug only exists in i915_gem_reset() and should not impact > > anywhere else. > > Well fixing the bug in i915_gem_reset would be lots more work and > rather fragile - we'd need to retire requests in the correct order. > That level of fanciness is generally not something I like to see in > less-tested code like the reset paths. Nope, just the operations in the right order. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx