Re: [PATCH] drm/i915: Treat i915_reset_engine() as guilty until proven innocent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/6/2018 2:30 PM, Chris Wilson wrote:
Quoting Michel Thierry (2018-04-06 22:23:21)
And I thought we believed in presumption of innocence...

On 4/6/2018 2:00 PM, Chris Wilson wrote:
If we are resetting just one engine, we know it has stalled. So we can
pass the stalled parameter directly to i915_gem_reset_engine(), which
alleviates the necessity to poke at the generic engine->hangcheck.stalled
magic variable, leaving that under control of hangcheck as its name
implies. Other than simplifying by removing the indirect parameter along
this path, this allows us to introduce new reset mechanisms that run
independently of hangcheck.
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Michel Thierry <michel.thierry@xxxxxxxxx>
Cc: Jeff McGee <jeff.mcgee@xxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_drv.c               |  2 +-
   drivers/gpu/drm/i915/i915_drv.h               |  3 +-
   drivers/gpu/drm/i915/i915_gem.c               | 36 +++++++++----------
   .../gpu/drm/i915/selftests/intel_hangcheck.c  |  9 -----
   4 files changed, 20 insertions(+), 30 deletions(-)

...
@@ -774,7 +766,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
                               break;
                       }
- engine->hangcheck.stalled = false;
                       count++;
if (rq) {


Are the ones in igt_handle_error() still needed?
    hangcheck.stalled = true;
    hangcheck.seqno = intel_engine_get_seqno(engine);

Because igt_handle_error is sending a real request.

(I think the only ones remaining in the selftest should be in
fake_hangcheck).

Right, fake_hangcheck definitely still needs it to behave like
hangcheck.

i915_handle_error() is still "odd". At the moment, yes we still need to
be poking where we shouldn't. If i915_handle_error() uses the
engine_mask to do per-engine resets, no, we don't need the magic
hangcheck.stalled. But, if it falls back to the full device level, we
loose the guilty reset.  So we do get a difference in behaviour, that
really hasn't been noticed before as the only real caller is from
hangcheck. (i915_wedged, I dare anyone to say what they expect ;)


Not the first time I forget about the full reset path.
Thanks for explaining it.

I think the answer will be to pass engine_mask to i915_reset. But I
haven't fleshed that out yet. I think it means we do away with
hangcheck.seqno as well, so bonus?


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux