Quoting Chris Wilson (2017-10-23 21:06:16) > Back in commit a4b2b01523a8 ("drm/i915: Don't mark an execlists > context-switch when idle") we noticed the presence of late > context-switch interrupts. We were able to filter those out by looking > at whether the ELSP remained active, but in commit beecec901790 > ("drm/i915/execlists: Preemption!") that became problematic as we now > anticipate receiving a context-switch event for preemption while ELSP > may be empty. To restore the spurious interrupt suppression, add a > counter for the expected number of pending context-switches and skip if > we do not need to handle this interrupt to make forward progress. Looking at an example from https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_1299/ the common case is where we still get the interrupt after already parsing the whole CSB: <6>[ 22.723238] i915 0000:00:02.0: [drm] vecs0 <6>[ 22.723246] i915 0000:00:02.0: [drm] current seqno 8, last 8, hangcheck 0 [-277277 ms], inflight 0 <6>[ 22.723260] i915 0000:00:02.0: [drm] Reset count: 0 <6>[ 22.723269] i915 0000:00:02.0: [drm] Requests: <6>[ 22.723278] i915 0000:00:02.0: [drm] RING_START: 0x007fb000 [0x00000000] <6>[ 22.723289] i915 0000:00:02.0: [drm] RING_HEAD: 0x00000278 [0x00000000] <6>[ 22.723300] i915 0000:00:02.0: [drm] RING_TAIL: 0x00000278 [0x00000000] <6>[ 22.723311] i915 0000:00:02.0: [drm] RING_CTL: 0x00003001 [] <6>[ 22.723322] i915 0000:00:02.0: [drm] ACTHD: 0x00000000_00000278 <6>[ 22.723333] i915 0000:00:02.0: [drm] BBADDR: 0x00000000_00000004 <6>[ 22.723343] i915 0000:00:02.0: [drm] Execlist status: 0x00000301 00000000 <6>[ 22.723355] i915 0000:00:02.0: [drm] Execlist CSB read 1 [1 cached], write 1 [1 from hws], interrupt posted? no <6>[ 22.723370] i915 0000:00:02.0: [drm] ELSP[0] idle <6>[ 22.723378] i915 0000:00:02.0: [drm] ELSP[1] idle <6>[ 22.723387] i915 0000:00:02.0: [drm] HW active? 0x0 <6>[ 22.723402] i915 0000:00:02.0: [drm] Those should not lead to hitting BUG_ON(gt.awake) though as the tasklet is flushed before we clear gt.awake. Except if maybe the interrupt arrives after the tasklet_kill... Given that we wait for the engines to be idle before parking, we should be safe enough with diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index bb0e85043e01..fa46137d431a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3327,6 +3327,8 @@ i915_gem_idle_work_handler(struct work_struct *work) if (new_requests_since_last_retire(dev_priv)) goto out_unlock; + synchronize_irq(dev_priv->drm.irq); + /* * We are committed now to parking the engines, make sure there * will be no more interrupts arriving later. to flush a pending irq and not worry about a multi-phase park. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx