Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno

Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> · Tue, 09 Jun 2015 11:21:01 +0300

On Mon, 08 Jun 2015, Imre Deak <imre.deak@xxxxxxxxx> wrote:
> By running igt/store_dword_loop_render on BXT we can hit a coherency
> problem where the seqno written at GPU command completion time is not
> seen by the CPU. This results in __i915_wait_request seeing the stale
> seqno and not completing the request (not considering the lost
> interrupt/GPU reset mechanism). I also verified that this isn't a case
> of a lost interrupt, or that the command didn't complete somehow: when
> the coherency issue occured I read the seqno via an uncached GTT mapping
> too. While the cached version of the seqno still showed the stale value
> the one read via the uncached mapping was the correct one.
>
> Work around this issue by clflushing the corresponding CPU cacheline
> following any store of the seqno and preceding any reading of it. When
> reading it do this only when the caller expects a coherent view.
>
> Testcase: igt/store_dword_loop_render
> Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 17 +++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++++++
>  2 files changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9f5485d..88bc5525 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
>  
>  static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>  {
> +	/*
> +	 * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM
> +	 * storing the completed request's seqno occasionally doesn't
> +	 * invalidate the CPU cache. Work around this by clflushing the
> +	 * corresponding cacheline whenever the caller wants the coherency to
> +	 * be guaranteed. Note that this cacheline is known to be
> +	 * clean at this point, since we only write it in gen8_set_seqno(),
> +	 * where we also do a clflush after the write. So this clflush in
> +	 * practice becomes an invalidate operation.
> +	 */
> +	if (IS_BROXTON(ring->dev) & !lazy_coherency)

Should be &&.

BR,
Jani.

> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
> +
>  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>  }
>  
>  static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
>  {
>  	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
> +
> +	/* See gen8_get_seqno() explaining the reason for the clflush. */
> +	if (IS_BROXTON(ring->dev))
> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>  }
>  
>  static int gen8_emit_request(struct intel_ringbuffer *ringbuf,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 39f6dfc..224a25b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring,
>  	return idx;
>  }
>  
> +static inline void
> +intel_flush_status_page(struct intel_engine_cs *ring, int reg)
> +{
> +	drm_clflush_virt_range(&ring->status_page.page_addr[reg],
> +			       sizeof(uint32_t));
> +}
> +
>  static inline u32
>  intel_read_status_page(struct intel_engine_cs *ring,
>  		       int reg)
> -- 
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx