Re: [PATCH 2/2] drm/i915/selftests: Improve error detection of reset failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Mika Kuoppala (2019-03-12 12:33:37)
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > Use a timedwait to promptly detect if the recovery after reset fails and
> > provide a meaningful debug dump.
> >
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > ---
> >  .../gpu/drm/i915/selftests/intel_hangcheck.c   | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index 10658ad05305..b5e35b2a925f 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -983,7 +983,23 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
> >                       count++;
> >  
> >                       if (rq) {
> > -                             i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
> > +                             if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> > +                                     struct drm_printer p =
> > +                                             drm_info_printer(i915->drm.dev);
> > +
> > +                                     pr_err("i915_reset_engine(%s:%s):"
> > +                                            " failed to complete request after reset\n",
> > +                                            engine->name, test_name);
> > +                                     intel_engine_dump(engine, &p,
> > +                                                       "%s\n", engine->name);
> > +                                     i915_request_put(rq);
> > +
> > +                                     GEM_TRACE_DUMP();
> 
> Would this be useful also in above, where we wait until request
> is running?

Hard to tell until we hit it. The big problem here is the trace is very
noisy; the important bits are in the engine dump which provide the
needle. And the trace may or may not have the information you are
looking for.

Generally, it's the post reset recovery that fails; once we are happy
the reset is ok, the next request is also likely to work. Or it may not,
hence the test!

In the case I was debugging, it wasn't so much a failure in reset
handling, but this test had the precise timing required to fool
unwind_incomplete_requests.

> > +                                     i915_gem_set_wedged(i915);
> > +                                     err = -EIO;
> 
> Moving the *rq out from loop and then doing if(rq) i915_request_put(rq);
> out loop after exit, might be easier on error handling, shrug.

Or more convoluted. Didn't feel particularly happy.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux