Quoting Tvrtko Ursulin (2019-09-23 14:46:27) > > On 21/09/2019 10:55, Chris Wilson wrote: > > Due to the nature of preempt-to-busy the execlists active tracking and > > the schedule queue may become temporarily desync'ed (between resubmission > > to HW and its ack from HW). This means that we may have unwound a > > request and passed it back to the virtual engine, but it is still > > inflight on the HW and may even result in a GPU hang. If we detect that > > GPU hang and try to reset, the hanging request->engine will no longer > > match the current engine, which means that the request is not on the > > execlists active list and we should not try to find an older incomplete > > request. Given that we have deduced this must be a request on a virtual > > engine, it is the single active request in the context and so must be > > guilty (as the context is still inflight, it is prevented from being > > executed on another engine as we process the reset). > > > > Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 7 +++++-- > > drivers/gpu/drm/i915/gt/intel_reset.c | 4 +--- > > 2 files changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 1b2bacc60300..3eadd294bcd7 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -2326,11 +2326,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine) > > > > static struct i915_request *active_request(struct i915_request *rq) > > { > > - const struct list_head * const list = > > - &i915_request_active_timeline(rq)->requests; > > const struct intel_context * const ce = rq->hw_context; > > struct i915_request *active = NULL; > > + struct list_head *list; > > > > + if (!i915_request_is_active(rq)) /* unwound, but incomplete! */ > > Is it time to store the veng pointer separately in the request so we can > add the assert here? Like > GEM_BUG_ON(!...engine_is_virtual(rq->orig_engine)) ? intel_engine_is_virtual(rq->hw_context->engine). But this is currently agnostic, and here we are only asking question is this request in the engine->active.list, which applies to both normal and virtual setups. Hmm, this bug does apply to normal reset, I only spotted it because we added the lockdep warnings that picked up that rq->engine was different. There maybe some use for that assertion, but I don't think this is one. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx