Quoting Mika Kuoppala (2018-03-22 14:26:41) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > If we fail to reset the GPU in a timely fashion, dump the GEM trace so > > that we can see what operations were in flight when the GPU got stuck. > > > > v2: There's more than one timeout that deserves tracing! > > v3: Silence checkpatch by not even using a product at all! > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 23 ++++++++++++++++++++--- > > 1 file changed, 20 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > > index 4372826998aa..9b235dae8dd9 100644 > > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > > @@ -260,8 +260,11 @@ static void wedge_me(struct work_struct *work) > > { > > struct wedge_me *w = container_of(work, typeof(*w), work.work); > > > > - pr_err("%pS timed out, cancelling all further testing.\n", > > - w->symbol); > > + pr_err("%pS timed out, cancelling all further testing.\n", w->symbol); > > + > > + GEM_TRACE("%pS timed out.\n", w->symbol); > > + GEM_TRACE_DUMP(); > > + > > i915_gem_set_wedged(w->i915); > > } > > > > @@ -621,9 +624,19 @@ static int active_engine(void *data) > > mutex_unlock(&engine->i915->drm.struct_mutex); > > > > if (old) { > > - i915_request_wait(old, 0, MAX_SCHEDULE_TIMEOUT); > > + if (i915_request_wait(old, 0, HZ) < 0) { > > + GEM_TRACE("%s timed out.\n", engine->name); > > + GEM_TRACE_DUMP(); > > + > > + i915_gem_set_wedged(engine->i915); > > + i915_request_put(old); > > + err = -EIO; > > + break; > > + } > > Using err = i915_request_wait() could have saved one extra request_put > but I dunno if it would be any cleaner. It's also -ETIME, which didn't fit my intention. > > > i915_request_put(old); > > } > > + > > + cond_resched(); > > To give more slack for other engines and main thread to proceed? Yes. Otherwise, it spins mighty fine. > > > } > > > > for (count = 0; count < ARRAY_SIZE(rq); count++) > > @@ -1126,6 +1139,10 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915) > > > > err = i915_subtests(tests, i915); > > > > + mutex_lock(&i915->drm.struct_mutex); > > + flush_test(i915, I915_WAIT_LOCKED); > > + mutex_unlock(&i915->drm.struct_mutex); > > + > > To wash out leftovers. Yeah, from the early abort we left requests unaccounted for and needed to grab the struct_mutex to run retire. Otherwise we hit assertions later on about trying to unload the driver with requests still inflight. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx