On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > On Mon, 08 Jun 2015, Imre Deak <imre.deak@xxxxxxxxx> wrote: > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > problem where the seqno written at GPU command completion time is not > > seen by the CPU. This results in __i915_wait_request seeing the stale > > seqno and not completing the request (not considering the lost > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > of a lost interrupt, or that the command didn't complete somehow: when > > the coherency issue occured I read the seqno via an uncached GTT mapping > > too. While the cached version of the seqno still showed the stale value > > the one read via the uncached mapping was the correct one. > > > > Work around this issue by clflushing the corresponding CPU cacheline > > following any store of the seqno and preceding any reading of it. When > > reading it do this only when the caller expects a coherent view. > > > > Testcase: igt/store_dword_loop_render > > Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > 2 files changed, 24 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index 9f5485d..88bc5525 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > { > > + /* > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > + * storing the completed request's seqno occasionally doesn't > > + * invalidate the CPU cache. Work around this by clflushing the > > + * corresponding cacheline whenever the caller wants the coherency to > > + * be guaranteed. Note that this cacheline is known to be > > + * clean at this point, since we only write it in gen8_set_seqno(), > > + * where we also do a clflush after the write. So this clflush in > > + * practice becomes an invalidate operation. > > + */ > > + if (IS_BROXTON(ring->dev) & !lazy_coherency) > > Should be &&. Thanks for catching it, I'll send a v2 with this fixed if there is no more feedback. > > BR, > Jani. > > > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > > + > > return intel_read_status_page(ring, I915_GEM_HWS_INDEX); > > } > > > > static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) > > { > > intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); > > + > > + /* See gen8_get_seqno() explaining the reason for the clflush. */ > > + if (IS_BROXTON(ring->dev)) > > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > > } > > > > static int gen8_emit_request(struct intel_ringbuffer *ringbuf, > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > > index 39f6dfc..224a25b 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > > @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, > > return idx; > > } > > > > +static inline void > > +intel_flush_status_page(struct intel_engine_cs *ring, int reg) > > +{ > > + drm_clflush_virt_range(&ring->status_page.page_addr[reg], > > + sizeof(uint32_t)); > > +} > > + > > static inline u32 > > intel_read_status_page(struct intel_engine_cs *ring, > > int reg) > > -- > > 2.1.4 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx