Quoting Tvrtko Ursulin (2018-05-30 11:43:16) > > On 29/05/2018 14:29, Chris Wilson wrote: > > Since we use i915_gem_find_active_request() from inside > > intel_engine_dump() and may call that at any time, we do not guarantee > > that the engine is paused nor that the signal kthreads and irq handler > > are suspended, so we cannot assert that the breadcrumb doesn't advance > > and that the irq hasn't happened on another CPU signaling the request we > > believe to be idle. > > > > Fixes: f636edb214a5 ("drm/i915: Make i915_engine_info pretty printer to standalone") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++--------- > > 1 file changed, 8 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 05f44ca35a06..530d6d0109b4 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -2972,23 +2972,22 @@ i915_gem_find_active_request(struct intel_engine_cs *engine) > > struct i915_request *request, *active = NULL; > > unsigned long flags; > > > > - /* We are called by the error capture and reset at a random > > - * point in time. In particular, note that neither is crucially > > - * ordered with an interrupt. After a hang, the GPU is dead and we > > - * assume that no more writes can happen (we waited long enough for > > - * all writes that were in transaction to be flushed) - adding an > > + /* > > + * We are called by the error capture, reset and to dump engine > > + * state at random points in time. In particular, note that neither is > > + * crucially ordered with an interrupt. After a hang, the GPU is dead > > + * and we assume that no more writes can happen (we waited long enough > > + * for all writes that were in transaction to be flushed) - adding an > > * extra delay for a recent interrupt is pointless. Hence, we do > > * not need an engine->irq_seqno_barrier() before the seqno reads. > > + * At all other times, we must assume the GPU is still running, but > > + * we only care about the snapshot of this moment. > > */ > > spin_lock_irqsave(&engine->timeline.lock, flags); > > list_for_each_entry(request, &engine->timeline.requests, link) { > > if (__i915_request_completed(request, request->global_seqno)) > > continue; > > > > - GEM_BUG_ON(request->engine != engine); > > Removal of this one belongs to virtual engine perhaps? Nah, even with virtual engine; the engine timeline list is still true. I was just thinking it was too random to check here (e.g. in the middle of error capture) and that our more recent addition of checking during retirement was a little more rigorous (and timely). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx