By running igt/store_dword_loop_render on BXT we can hit a coherency problem where the seqno written at GPU command completion time is not seen by the CPU. This results in __i915_wait_request seeing the stale seqno and not completing the request (not considering the lost interrupt/GPU reset mechanism). I also verified that this isn't a case of a lost interrupt, or that the command didn't complete somehow: when the coherency issue occured I read the seqno via an uncached GTT mapping too. While the cached version of the seqno still showed the stale value the one read via the uncached mapping was the correct one. Work around this issue by clflushing the corresponding CPU cacheline following any store of the seqno and preceding any reading of it. When reading it do this only when the caller expects a coherent view. Testcase: igt/store_dword_loop_render Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx> --- drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ 2 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9f5485d..88bc5525 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) { + /* + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM + * storing the completed request's seqno occasionally doesn't + * invalidate the CPU cache. Work around this by clflushing the + * corresponding cacheline whenever the caller wants the coherency to + * be guaranteed. Note that this cacheline is known to be + * clean at this point, since we only write it in gen8_set_seqno(), + * where we also do a clflush after the write. So this clflush in + * practice becomes an invalidate operation. + */ + if (IS_BROXTON(ring->dev) & !lazy_coherency) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); + return intel_read_status_page(ring, I915_GEM_HWS_INDEX); } static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) { intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); + + /* See gen8_get_seqno() explaining the reason for the clflush. */ + if (IS_BROXTON(ring->dev)) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); } static int gen8_emit_request(struct intel_ringbuffer *ringbuf, diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 39f6dfc..224a25b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, return idx; } +static inline void +intel_flush_status_page(struct intel_engine_cs *ring, int reg) +{ + drm_clflush_virt_range(&ring->status_page.page_addr[reg], + sizeof(uint32_t)); +} + static inline u32 intel_read_status_page(struct intel_engine_cs *ring, int reg) -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx