On Wed, Feb 22, 2017 at 05:30:59PM +0000, Tvrtko Ursulin wrote: > > On 22/02/2017 16:42, Chris Wilson wrote: > >When dma_fence_signal() is called, it sets a flag to indicate the fenece > >is complete. Before the dma_fence is signaled, the seqno check will > >first be passed. However, if the request is being held by an observer > >across a global seqno reset, it is possible for the observer to not > >notice the seqno completion (as the HWS is now zero), but instead we > >rely on the signaled bit. External holders of the request should be > >using the fence interface which does first check the signaled bit - it > >will only be an unlocked waiter that may not notice the request > >completion. > > > >By that analysis, the problem is old as our unlocked waits, but this > >patch depends upon subclassing dma-fence, so use that as the base > >commit. > > > >Fixes: 04769652c8c7 ("drm/i915: Derive GEM requests from dma-fence") > >Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > >Cc: <stable@xxxxxxxxxxxxxxx> # v4.9+ > >--- > > drivers/gpu/drm/i915/i915_drv.h | 3 +++ > > 1 file changed, 3 insertions(+) > > > >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >index 994929660027..dcc8f73b26bf 100644 > >--- a/drivers/gpu/drm/i915/i915_drv.h > >+++ b/drivers/gpu/drm/i915/i915_drv.h > >@@ -4026,6 +4026,9 @@ __i915_request_irq_complete(struct drm_i915_gem_request *req) > > { > > struct intel_engine_cs *engine = req->engine; > > > >+ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &req->fence.flags)) > >+ return true; > >+ > > /* Before we do the heavier coherent read of the seqno, > > * check the value (hopefully) in the CPU cacheline. > > */ > > > > It is obviously correct in a way that it can never give a false > positive, even though I haven't quite figured how we had false > negatives without it. I'll leave that for a new day. Hopefully there > are no other places in our code which could be affected in the same > way? For starters, we can exclude all checks that hold struct_mutex (from the time it gets the request from the source to the i915_gem_request_completed, they are protected by the init_global_seqno requiring the struct_mutex to do the rollover, and by retirement removing the state trackers). That rules out the majority. For the unlocked accessors, we can remove all the dma-fence clients as they already have a check against the signaled bit before checking against the seqno. The only place I found that was susceptible was the unlocked waiter.... And actually that is protected by init_global_seqno clearing the waiters before doing the rollover. Okay, no need to panic. I'll mollify the changelog and drop stable - I still want the patch for the later changes. -Chris -- Chris Wilson, Intel Open Source Technology Centre