Re: [PATCH v2] drm/i915/selftests: Include the trace as a debug aide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Jeff McGee (2018-03-22 19:29:16)
> On Thu, Mar 22, 2018 at 02:30:09PM +0000, Chris Wilson wrote:
> > Quoting Mika Kuoppala (2018-03-22 14:26:41)
> > > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> > > 
> > > > If we fail to reset the GPU in a timely fashion, dump the GEM trace so
> > > > that we can see what operations were in flight when the GPU got stuck.
> > > >
> > > > v2: There's more than one timeout that deserves tracing!
> > > > v3: Silence checkpatch by not even using a product at all!
> > > >
> > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > > ---
> > > >  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 23 ++++++++++++++++++++---
> > > >  1 file changed, 20 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > > > index 4372826998aa..9b235dae8dd9 100644
> > > > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > > > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > > > @@ -260,8 +260,11 @@ static void wedge_me(struct work_struct *work)
> > > >  {
> > > >       struct wedge_me *w = container_of(work, typeof(*w), work.work);
> > > >  
> > > > -     pr_err("%pS timed out, cancelling all further testing.\n",
> > > > -            w->symbol);
> > > > +     pr_err("%pS timed out, cancelling all further testing.\n", w->symbol);
> > > > +
> > > > +     GEM_TRACE("%pS timed out.\n", w->symbol);
> > > > +     GEM_TRACE_DUMP();
> > > > +
> > > >       i915_gem_set_wedged(w->i915);
> > > >  }
> > > >  
> > > > @@ -621,9 +624,19 @@ static int active_engine(void *data)
> > > >               mutex_unlock(&engine->i915->drm.struct_mutex);
> > > >  
> > > >               if (old) {
> > > > -                     i915_request_wait(old, 0, MAX_SCHEDULE_TIMEOUT);
> > > > +                     if (i915_request_wait(old, 0, HZ) < 0) {
> > > > +                             GEM_TRACE("%s timed out.\n", engine->name);
> > > > +                             GEM_TRACE_DUMP();
> > > > +
> > > > +                             i915_gem_set_wedged(engine->i915);
> > > > +                             i915_request_put(old);
> > > > +                             err = -EIO;
> > > > +                             break;
> > > > +                     }
> > > 
> > > Using err = i915_request_wait() could have saved one extra request_put
> > > but I dunno if it would be any cleaner.
> > 
> > It's also -ETIME, which didn't fit my intention.
> > 
> > > 
> > > >                       i915_request_put(old);
> > > >               }
> > > > +
> > > > +             cond_resched();
> > > 
> > > To give more slack for other engines and main thread to proceed?
> > 
> > Yes. Otherwise, it spins mighty fine.
> > > 
> > > >       }
> > > >  
> > > >       for (count = 0; count < ARRAY_SIZE(rq); count++)
> > > > @@ -1126,6 +1139,10 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
> > > >  
> > > >       err = i915_subtests(tests, i915);
> > > >  
> > > > +     mutex_lock(&i915->drm.struct_mutex);
> > > > +     flush_test(i915, I915_WAIT_LOCKED);
> > > > +     mutex_unlock(&i915->drm.struct_mutex);
> > > > +
> > > 
> > > To wash out leftovers.
> > 
> > Yeah, from the early abort we left requests unaccounted for and needed
> > to grab the struct_mutex to run retire. Otherwise we hit assertions
> > later on about trying to unload the driver with requests still inflight.
> > -Chris
> 
> On this and the 3 others in this series...
> 
> Reviewed-by: Jeff McGee <jeff.mcgee@xxxxxxxxx>

Much appreciated. Pushed, let's hope CI holds up and that we're ready to
start talking about the real changes required for forced preemption as
opposed to getting the existing code working.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux