On Mon, Nov 05, 2018 at 09:43:05AM +0000, Chris Wilson wrote: > Exercising the gpu reloc path strenuously revealed an issue where the > updated relocations (from MI_STORE_DWORD_IMM) were not being observed > upon execution. After some experiments with adding pipecontrols (a lot > of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe > controls or even the current on), it was discovered that we merely > needed to delay the EMIT_INVALIDATE by several flushes. It is important > to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that > needs the delay as opposed to what one might first expect -- that the > delay is required for the TLB invalidation to take effect (one presumes > to purge any CS buffers) as opposed to a delay after flushing to ensure > the writes have landed before triggering invalidation. > > Testcase: igt/gem_tiled_fence_blits > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 38 +++++++++++++++++++++++-- > 1 file changed, 36 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index b8a7a014d46d..87eebc13c0d8 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -91,6 +91,7 @@ static int > gen4_render_ring_flush(struct i915_request *rq, u32 mode) > { > u32 cmd, *cs; > + int i; > > /* > * read/write caches: > @@ -127,12 +128,45 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode) > cmd |= MI_INVALIDATE_ISP; > } > > - cs = intel_ring_begin(rq, 2); > + i = 2; > + if (mode & EMIT_INVALIDATE) > + i += 20; > + > + cs = intel_ring_begin(rq, i); > if (IS_ERR(cs)) > return PTR_ERR(cs); > > *cs++ = cmd; > - *cs++ = MI_NOOP; > + > + /* > + * A random delay to let the CS invalidate take effect? Without this > + * delay, the GPU relocation path fails as the CS does not see > + * the updated contents. Just as important, if we apply the flushes > + * to the EMIT_FLUSH branch (i.e. immediately after the relocation > + * write and before the invalidate on the next batch), the relocations > + * still fail. This implies that is a delay following invalidation > + * that is required to reset the caches as opposed to a delay to > + * ensure the memory is written. > + */ > + if (mode & EMIT_INVALIDATE) { > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > + PIPE_CONTROL_GLOBAL_GTT; > + *cs++ = 0; > + *cs++ = 0; > + > + for (i = 0; i < 12; i++) > + *cs++ = MI_FLUSH; > + > + *cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE; > + *cs++ = i915_ggtt_offset(rq->engine->scratch) | > + PIPE_CONTROL_GLOBAL_GTT; > + *cs++ = 0; > + *cs++ = 0; > + } This smells a lot like the snb a/b w/a, except there the spec says to use 8 STORE_DWORDS. I suppose the choice of a specific command isn't critical, and it's just a matter of stuffing the pipeline with something that's takes long enough to let the TLB invalidate finish? Anyways, patch itself seems as reasonable as one might expect for an issue like this. Reviewed-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > + > + *cs++ = cmd; > + > intel_ring_advance(rq, cs); > > return 0; > -- > 2.19.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel