Re: [PATCH] drm/i915: Flush pending interrupt following a GPU reset

Jeff McGee <jeff.mcgee@xxxxxxxxx> · Wed, 21 Mar 2018 10:10:56 -0700



On Wed, Mar 21, 2018 at 04:42:32PM +0000, Chris Wilson wrote:
> Quoting Jeff McGee (2018-03-21 15:55:16)
> > On Wed, Mar 21, 2018 at 03:00:23PM +0000, Chris Wilson wrote:
> > > After resetting the GPU (or subset of engines), call synchronize_irq()
> > > to flush any pending irq before proceeding with the cleanup. For a
> > > device level reset, we disable the interupts around the reset, but when
> > > resetting just one engine, we have to avoid such global disabling. This
> > > leaves us open to an interrupt arriving for the engine as we try to
> > > reset it. We already do try to flush the IIR following the reset, but we
> > > have to ensure that the in-flight interrupt does not land after we start
> > > cleaning up after the reset; enter synchronize_irq().
> > > 
> > > As it current stands, we very rarely, but fatally, see sequences such as:
> > > 
> > >     2.... 57964564us : execlists_reset_prepare: rcs0
> > >     2.... 57964613us : execlists_reset: rcs0 seqno=424
> > >     0d.h1 57964615us : gen8_cs_irq_handler: rcs0 CS active=1
> > >     2d..1 57964617us : __i915_request_unsubmit: rcs0 fence 29:1056 <- global_seqno 1060
> > >     2.... 57964703us : execlists_reset_finish: rcs0
> > >     0..s. 57964705us : execlists_submission_tasklet: rcs0 awake?=1, active=0, irq-posted?=1
> > > 
> > I can repro this sequence easily with force preemption IGT.
> 
> With the sequence I suggested?
> -Chris

Yes. Your approach to protecting port[1] context is working well. This is
the only issue I'm still hitting. I'll post my updated RFC set in a sec.
-Jeff
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx