Quoting Tvrtko Ursulin (2018-05-30 12:01:47) > > On 30/05/2018 11:55, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-05-30 11:43:16) > >> > >> On 29/05/2018 14:29, Chris Wilson wrote: > >>> Since we use i915_gem_find_active_request() from inside > >>> intel_engine_dump() and may call that at any time, we do not guarantee > >>> that the engine is paused nor that the signal kthreads and irq handler > >>> are suspended, so we cannot assert that the breadcrumb doesn't advance > >>> and that the irq hasn't happened on another CPU signaling the request we > >>> believe to be idle. > >>> > >>> Fixes: f636edb214a5 ("drm/i915: Make i915_engine_info pretty printer to standalone") > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > >>> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++--------- > >>> 1 file changed, 8 insertions(+), 9 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>> index 05f44ca35a06..530d6d0109b4 100644 > >>> --- a/drivers/gpu/drm/i915/i915_gem.c > >>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >>> @@ -2972,23 +2972,22 @@ i915_gem_find_active_request(struct intel_engine_cs *engine) > >>> struct i915_request *request, *active = NULL; > >>> unsigned long flags; > >>> > >>> - /* We are called by the error capture and reset at a random > >>> - * point in time. In particular, note that neither is crucially > >>> - * ordered with an interrupt. After a hang, the GPU is dead and we > >>> - * assume that no more writes can happen (we waited long enough for > >>> - * all writes that were in transaction to be flushed) - adding an > >>> + /* > >>> + * We are called by the error capture, reset and to dump engine > >>> + * state at random points in time. In particular, note that neither is > >>> + * crucially ordered with an interrupt. After a hang, the GPU is dead > >>> + * and we assume that no more writes can happen (we waited long enough > >>> + * for all writes that were in transaction to be flushed) - adding an > >>> * extra delay for a recent interrupt is pointless. Hence, we do > >>> * not need an engine->irq_seqno_barrier() before the seqno reads. > >>> + * At all other times, we must assume the GPU is still running, but > >>> + * we only care about the snapshot of this moment. > >>> */ > >>> spin_lock_irqsave(&engine->timeline.lock, flags); > >>> list_for_each_entry(request, &engine->timeline.requests, link) { > >>> if (__i915_request_completed(request, request->global_seqno)) > >>> continue; > >>> > >>> - GEM_BUG_ON(request->engine != engine); > >> > >> Removal of this one belongs to virtual engine perhaps? > > > > Nah, even with virtual engine; the engine timeline list is still true. I > > was just thinking it was too random to check here (e.g. in the middle of > > error capture) and that our more recent addition of checking during > > retirement was a little more rigorous (and timely). > > It is a random check indeed. Commit message append: "While at it remove > a random assert on..." > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Pushed this patch by itself, just so I can have a new CI baseline :) Thanks for the review, -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx