Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> writes: > On 5/7/20 7:20 AM, Mika Kuoppala wrote: >> All engines, exception being blitter as it does not >> care about the form, can access compressed surfaces. >> >> So we need to add forced aux table invalidates >> for those engines. >> >> v2: virtual instance masking (Chris) >> v3: bug on if not found (Chris) >> >> References: d248b371f747 ("drm/i915/gen12: Invalidate aux table entries forcibly") >> References bspec#43904, hsdes#1809175790 >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >> Cc: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> >> Cc: Rafael Antognolli <rafael.antognolli@xxxxxxxxx> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> >> --- >> drivers/gpu/drm/i915/gt/intel_lrc.c | 86 +++++++++++++++++++++++++++-- >> drivers/gpu/drm/i915/i915_reg.h | 6 ++ >> 2 files changed, 87 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c >> index bbdb0e2a4571..e12916e2799b 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >> @@ -4539,11 +4539,36 @@ static u32 preparser_disable(bool state) >> return MI_ARB_CHECK | 1 << 8 | state; >> } >> >> +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) >> +{ >> + static const i915_reg_t vd[] = { >> + GEN12_VD0_AUX_NV, >> + GEN12_VD1_AUX_NV, >> + GEN12_VD2_AUX_NV, >> + GEN12_VD3_AUX_NV, >> + }; >> + >> + static const i915_reg_t ve[] = { >> + GEN12_VE0_AUX_NV, >> + GEN12_VE1_AUX_NV, >> + }; >> + >> + if (engine->class == VIDEO_DECODE_CLASS) >> + return vd[engine->instance]; >> + >> + if (engine->class == VIDEO_ENHANCEMENT_CLASS) >> + return ve[engine->instance]; >> + >> + GEM_BUG_ON("unknown aux_inv_reg\n"); >> + >> + return INVALID_MMIO_REG; >> +} >> + >> static u32 * >> -gen12_emit_aux_table_inv(struct i915_request *rq, u32 *cs) >> +gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs) >> { >> *cs++ = MI_LOAD_REGISTER_IMM(1); >> - *cs++ = i915_mmio_reg_offset(GEN12_GFX_CCS_AUX_NV); >> + *cs++ = i915_mmio_reg_offset(inv_reg); >> *cs++ = AUX_INV; >> *cs++ = MI_NOOP; >> >> @@ -4612,7 +4637,7 @@ static int gen12_emit_flush_render(struct i915_request *request, >> cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR); >> >> /* hsdes: 1809175790 */ >> - cs = gen12_emit_aux_table_inv(request, cs); >> + cs = gen12_emit_aux_table_inv(GEN12_GFX_CCS_AUX_NV, cs); >> >> *cs++ = preparser_disable(false); >> intel_ring_advance(request, cs); >> @@ -4621,6 +4646,56 @@ static int gen12_emit_flush_render(struct i915_request *request, >> return 0; >> } >> >> +static int gen12_emit_flush(struct i915_request *request, u32 mode) >> +{ >> + intel_engine_mask_t aux_inv = 0; >> + u32 cmd, *cs; >> + >> + if (mode & EMIT_INVALIDATE) >> + aux_inv = request->engine->mask & ~BIT(BCS0); >> + >> + cs = intel_ring_begin(request, >> + 4 + (aux_inv ? 2 * hweight8(aux_inv) + 2 : 0)); >> + if (IS_ERR(cs)) >> + return PTR_ERR(cs); >> + >> + cmd = MI_FLUSH_DW + 1; >> + >> + /* We always require a command barrier so that subsequent >> + * commands, such as breadcrumb interrupts, are strictly ordered >> + * wrt the contents of the write cache being flushed to memory >> + * (and thus being coherent from the CPU). >> + */ >> + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW; >> + >> + if (mode & EMIT_INVALIDATE) { >> + cmd |= MI_INVALIDATE_TLB; >> + if (request->engine->class == VIDEO_DECODE_CLASS) >> + cmd |= MI_INVALIDATE_BSD; >> + } >> + >> + *cs++ = cmd; >> + *cs++ = LRC_PPHWSP_SCRATCH_ADDR; >> + *cs++ = 0; /* upper addr */ >> + *cs++ = 0; /* value */ >> + >> + if (aux_inv) { /* hsdes: 1809175790 */ >> + struct intel_engine_cs *engine; >> + unsigned int tmp; >> + >> + *cs++ = MI_LOAD_REGISTER_IMM(hweight8(aux_inv)); >> + for_each_engine_masked(engine, request->engine->gt, >> + aux_inv, tmp) { >> + *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); >> + *cs++ = AUX_INV; > > Why do we loop through all engines? AFAICS the WA just says to > invalidate the current one. If it is because we don't know what we're > running on, can't we just use the automatic mmio remap on the CS? That > was added on purpose for per-engine registers that are not relative to > the mmio base (see bspec 45606) > > Daniele I looked at the auto remap feature and it should be doable with that feature. Now it will inv with all engine backing up a virtual one. So there is a difference. But we do not have a bug on other engines to require inv. So we play it safe. We would need a media pipeline expert to ack/nack actually. Thanks for the remap io pointer, -Mika _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx