Re: [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media

Matt Roper <matthew.d.roper@xxxxxxxxx> · Thu, 20 Apr 2023 13:52:38 -0700

On Wed, Apr 19, 2023 at 04:00:54PM -0700, fei.yang@xxxxxxxxx wrote:
> From: Fei Yang <fei.yang@xxxxxxxxx>
> 
> This patch implements Wa_22016122933.
> 
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).

Is this description accurate?  This patch doesn't insert an mfence
instruction itself, it just calls intel_guc_write_barrier().  On
platforms like MTL that aren't using local memory, that issues a wmb()
barrier, which I believe is implemented as an sfence, not mfence.  You'd
need to be doing a mb() call to get an mfence.

I think in general this level of explanation is unnecessary; you can
just give a high-level description indicating that we force the
write-combine buffer to be flushed and not give the low-level specifics
of what instruction that translates to at the x86 level.

Aside from simplifying the commit message,

Reviewed-by: Matt Roper <matthew.d.roper@xxxxxxxxx>

> 
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
> 
> BSpec: 45101
> 
> Signed-off-by: Fei Yang <fei.yang@xxxxxxxxx>
> Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
> Acked-by: Nirmoy Das <nirmoy.das@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>  4 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>  					  struct drm_i915_gem_object *obj,
>  					  bool always_coherent)
>  {
> -	if (i915_gem_object_is_lmem(obj))
> +	/*
> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
> +	 */
> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>  		return I915_MAP_WC;
>  	if (HAS_LLC(i915) || always_coherent)
>  		return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>  	if (obj->base.size < gsc->fw.size)
>  		return -ENOSPC;
>  
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>  	dst = i915_gem_object_pin_map_unlocked(obj,
>  					       i915_coherent_map_type(i915, obj, true));
>  	if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>  	memset(dst, 0, obj->base.size);
>  	memcpy(dst, src, gsc->fw.size);
>  
> +	/*
> +	 * Wa_22016122933: Making sure the data in dst is
> +	 * visible to GSC right away
> +	 */
> +	intel_guc_write_barrier(&gt->uc.guc);
> +
>  	i915_gem_object_unpin_map(gsc->fw.obj);
>  	i915_gem_object_unpin_map(obj);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(gt->i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma))
>  		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> +	/*
> +	 * Wa_22016122933: Making sure the head update is
> +	 * visible to GuC right away
> +	 */
> +	intel_guc_write_barrier(ct_to_guc(ct));
> +
>  	return available - len;
>  
>  corrupted:
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation