Quoting Tvrtko Ursulin (2020-01-21 11:20:36) > > On 21/01/2020 10:09, Chris Wilson wrote: > > If we encounter a hang on a virtual engine, as we process the hang the > > request may already have been moved back to the virtual engine (we are > > processing the hang on the physical engine). We need to reclaim the > > request from the virtual engine so that the locking is consistent and > > local to the real engine on which we will hold the request for error > > state capturing. > > > > Fixes: 748317386afb ("drm/i915/execlists: Offline error capture") > > Testcase: igt/gem_exec_balancer/hang > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 20 ++++++++++++++++++++ > > 1 file changed, 20 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 3a30767ff0c4..a0acf1898c1e 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -2582,6 +2582,26 @@ static void execlists_capture(struct intel_engine_cs *engine) > > cap->rq = active_request(cap->rq->context->timeline, cap->rq); > > GEM_BUG_ON(!cap->rq); > > > > + if (cap->rq->engine != engine) { /* preempted virtual engine */ > > + struct virtual_engine *ve = to_virtual_engine(cap->rq->engine); > > + unsigned long flags; > > + > > + /* > > + * An unsubmitted request along a virtual engine will > > + * remain on the active (this) engine until we are able > > + * to process the context switch away (and so mark the > > + * context as no longer in flight). That cannot have happened > > + * yet, otherwise we would not be hanging! > > + */ > > + spin_lock_irqsave(&ve->base.active.lock, flags); > > + GEM_BUG_ON(intel_context_inflight(cap->rq->context) != engine); > > + GEM_BUG_ON(ve->request != cap->rq); > > + ve->request = NULL; > > + spin_unlock_irqrestore(&ve->base.active.lock, flags); > > + > > + cap->rq->engine = engine; > > + } > > Conceptually this I think belongs in execlists_hold, not capture. Since > hold has the responsibility to hold correctly. Yeah, I didn't put it into execlists_hold() as thought that would be clearer without it, and that this was more a side-effect of process_csb() vs __unwind_incomplete_requests(), and didn't want to mix up with the question of engine->active.lock Agreed that execlists_capture() does not describe it well, so probably shouldn't be called directly from here. > Then also, this is all about a race with __unwind_incomplete_requests, > yes? If so would need to be done under the engine->active.lock or this > can still happen between now and execlists_hold. The serialisation here is on process_csb(), since we have prevented the tasklet being called and own the reset on the engine, we know that execlists_schedule_out() can't be called on this engine, and so we own the virtual request. It's not protected by the engine->active.lock. So happy with moving it to before the lock in execlists_hold()? -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx