Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> writes: > Imre Deak <imre.deak@xxxxxxxxx> writes: > >> By running igt/store_dword_loop_render on BXT we can hit a coherency >> problem where the seqno written at GPU command completion time is not >> seen by the CPU. This results in __i915_wait_request seeing the stale >> seqno and not completing the request (not considering the lost >> interrupt/GPU reset mechanism). I also verified that this isn't a case >> of a lost interrupt, or that the command didn't complete somehow: when >> the coherency issue occured I read the seqno via an uncached GTT mapping >> too. While the cached version of the seqno still showed the stale value >> the one read via the uncached mapping was the correct one. >> >> Work around this issue by clflushing the corresponding CPU cacheline >> following any store of the seqno and preceding any reading of it. When >> reading it do this only when the caller expects a coherent view. >> >> Testcase: igt/store_dword_loop_render >> Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx> >> --- >> drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ >> drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ >> 2 files changed, 24 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c >> index 9f5485d..88bc5525 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, >> >> static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) >> { >> + /* >> + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM >> + * storing the completed request's seqno occasionally doesn't >> + * invalidate the CPU cache. Work around this by clflushing the >> + * corresponding cacheline whenever the caller wants the coherency to >> + * be guaranteed. Note that this cacheline is known to be >> + * clean at this point, since we only write it in gen8_set_seqno(), >> + * where we also do a clflush after the write. So this clflush in >> + * practice becomes an invalidate operation. >> + */ >> + if (IS_BROXTON(ring->dev) & !lazy_coherency) > > s/&/&& ? s//Read The Whole Thread Before Replying -Mika > -Mika > >> + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); >> + >> return intel_read_status_page(ring, I915_GEM_HWS_INDEX); >> } >> >> static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) >> { >> intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); >> + >> + /* See gen8_get_seqno() explaining the reason for the clflush. */ >> + if (IS_BROXTON(ring->dev)) >> + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); >> } >> >> static int gen8_emit_request(struct intel_ringbuffer *ringbuf, >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h >> index 39f6dfc..224a25b 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h >> @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, >> return idx; >> } >> >> +static inline void >> +intel_flush_status_page(struct intel_engine_cs *ring, int reg) >> +{ >> + drm_clflush_virt_range(&ring->status_page.page_addr[reg], >> + sizeof(uint32_t)); >> +} >> + >> static inline u32 >> intel_read_status_page(struct intel_engine_cs *ring, >> int reg) >> -- >> 2.1.4 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx