Quoting Tvrtko Ursulin (2018-02-07 10:40:46) > > On 06/02/2018 09:46, Chris Wilson wrote: > > When a request is preempted, it is unsubmitted from the HW queue and > > removed from the active list of breadcrumbs. In the process, this > > however triggers the signaler and it may see the clear rbtree with the > > old, and still valid, seqno. This confuses the signaler into action and > > signaling the fence. > > > > Fixes: d6a2289d9d6b ("drm/i915: Remove the preempted request from the execution queue") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> # v4.12+ > > --- > > drivers/gpu/drm/i915/intel_breadcrumbs.c | 20 ++++---------------- > > 1 file changed, 4 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c > > index efbc627a2a25..b955f7d7bd0f 100644 > > --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c > > +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c > > @@ -588,29 +588,16 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine, > > spin_unlock_irq(&b->rb_lock); > > } > > > > -static bool signal_valid(const struct drm_i915_gem_request *request) > > -{ > > - return intel_wait_check_request(&request->signaling.wait, request); > > -} > > - > > static bool signal_complete(const struct drm_i915_gem_request *request) > > { > > if (!request) > > return false; > > > > - /* If another process served as the bottom-half it may have already > > - * signalled that this wait is already completed. > > - */ > > - if (intel_wait_complete(&request->signaling.wait)) > > - return signal_valid(request); > > Okay so this can return true for unsubmitted requests since rb node will > be empty and global_seqno == wait.seqno == 0. Hmm, ah, signal_valid() operated under the belief that its wait.seqno was untouched. That makes a bit more sense now. I was having to concoct some scary data races to try and explain how global_seqno and wait.seqno had the same non-zero value. -Chris