Re: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18/03/2022 05:26, fei.yang@xxxxxxxxx wrote:
From: Fei Yang <fei.yang@xxxxxxxxx>

GPU hangs have been observed when multiple engines write to the
same aux_inv register at the same time. To avoid this each engine
should only invalidate its own auxiliary table. The function
gen12_emit_flush_xcs() currently invalidate the auxiliary table for
all engines because the rq->engine is not necessarily the engine
eventually carrying out the request, and potentially the engine
could even be a virtual one (with engine->instance being -1).
With the MMIO remap feature, we can actually set bit 17 of MI_LRI
instruction and let the hardware to figure out the local aux_inv
register at runtime to avoid invalidating auxiliary table for all
engines.

Bspec: 45728

Cc: Stuart Summers <stuart.summers@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Signed-off-by: Chris Wilson <chris.p.wilson@xxxxxxxxx>
Signed-off-by: Fei Yang <fei.yang@xxxxxxxxx>
---
  drivers/gpu/drm/i915/gt/gen8_engine_cs.c     | 42 +++++---------------
  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  1 +
  2 files changed, 11 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 36148887c699..d440c5dfb6b7 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -165,30 +165,6 @@ static u32 preparser_disable(bool state)
  	return MI_ARB_CHECK | 1 << 8 | state;
  }
-static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
-{
-	static const i915_reg_t vd[] = {
-		GEN12_VD0_AUX_NV,
-		GEN12_VD1_AUX_NV,
-		GEN12_VD2_AUX_NV,
-		GEN12_VD3_AUX_NV,
-	};
-
-	static const i915_reg_t ve[] = {
-		GEN12_VE0_AUX_NV,
-		GEN12_VE1_AUX_NV,
-	};
-
-	if (engine->class == VIDEO_DECODE_CLASS)
-		return vd[engine->instance];
-
-	if (engine->class == VIDEO_ENHANCEMENT_CLASS)
-		return ve[engine->instance];
-
-	GEM_BUG_ON("unknown aux_inv reg\n");
-	return INVALID_MMIO_REG;
-}
-
  static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
  {
  	*cs++ = MI_LOAD_REGISTER_IMM(1);
@@ -296,7 +272,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
  		if (!HAS_FLAT_CCS(rq->engine->i915)) {
  			aux_inv = rq->engine->mask & ~BIT(BCS0);
  			if (aux_inv)
-				cmd += 2 * hweight32(aux_inv) + 2;
+				cmd += 4;
  		}
  	}
@@ -329,14 +305,16 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
  	*cs++ = 0; /* value */
if (aux_inv) { /* hsdes: 1809175790 */
-		struct intel_engine_cs *engine;
-		unsigned int tmp;
-
-		*cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
-		for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
-			*cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
-			*cs++ = AUX_INV;
+		*cs++ = MI_LOAD_REGISTER_IMM(1) | MI_LRI_MMIO_REMAP_EN;

Cool, I didn't know this exists. First Bspec link I found did not mention these register, but second did. That one however (29545) has a worrying "removed by" tag which seems to point to a story suggesting the remapping table might be gone on machines with flat ccs?! Could you double check please?

+		if (rq->engine->class == VIDEO_DECODE_CLASS) {
+			*cs++ = i915_mmio_reg_offset(GEN12_VD0_AUX_NV);
+		} else if (rq->engine->class == VIDEO_ENHANCEMENT_CLASS) {
+			*cs++ = i915_mmio_reg_offset(GEN12_VE0_AUX_NV);
+		} else {
+			GEM_BUG_ON("unknown aux_inv reg\n");
+			*cs++ = i915_mmio_reg_offset(INVALID_MMIO_REG);

I'd consider not emitting the LRI if we don't know what to put in unless there is some hidden point to do it?

  		}
+		*cs++ = AUX_INV;
  		*cs++ = MI_NOOP;
  	}
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index d112ffd56418..2d150eec5c65 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -144,6 +144,7 @@
  #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
  /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
  #define   MI_LRI_LRM_CS_MMIO		REG_BIT(19)
+#define   MI_LRI_MMIO_REMAP_EN		(1 << 17)
  #define   MI_LRI_FORCE_POSTED		(1<<12)

Passing observation - three bits, three flavours of expressing them, sigh...

Regards,

Tvrtko

  #define MI_LOAD_REGISTER_IMM_MAX_REGS (126)
  #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux