Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 29 Aug 2017 16:22:40 +0100

Quoting Jeff McGee (2017-08-28 20:46:00)
> On Mon, Aug 28, 2017 at 12:41:58PM -0700, Michel Thierry wrote:
> > On 28/08/17 12:25, jeff.mcgee@xxxxxxxxx wrote:
> > >From: Jeff McGee <jeff.mcgee@xxxxxxxxx>
> > >
> > >If someone else is resetting the engine we should clear our own bit as
> > >part of skipping that engine. Otherwise we will later believe that it
> > >has not been reset successfully and then trigger full gpu reset. If the
> > >other guy's reset actually fails, he will trigger the full gpu reset.
> > >
> > 
> > Did you hit this by manually setting wedged to 'x' ring repeatedly?
> > 
> I haven't actually reproduced it. Have just been looking at the code a
> lot to try to develop reset for preemption enforcement. The implementation
> will call i915_handle_error from another work item that can run concurrent
> with hangcheck.

Note to hit it in practice is a nasty bug. The assumption is that between
a pair of resets there was sufficient time for the engine to recover,
and so if we reset too quickly we conclude that the reset/recovery
mechanism is broken.

And if you do start playing with fast resets, you very quickly find that
kthread_park is a livelock waiting to happen.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx