Quoting Mika Kuoppala (2018-12-28 12:03:26) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > In preparation for removing the manual EMIT_FLUSH prior to emitting the > > breadcrumb implement the flush inline with writing the breadcrumb for > > ringbuffer emission. > > > > With a combined flush+breadcrumb, we can use a single operation to both > > flush and after the flush is complete (post-sync) write the breadcrumb. > > This gives us a strongly ordered operation that should be sufficient to > > serialise the write before we emit the interrupt; and therefore we may > > take the opportunity to remove the irq_seqno_barrier w/a for gen6+. > > Although using the PIPECONTROL to write the breadcrumb is slower than > > MI_STORE_DWORD_IMM, by combining the operations into one and removing the > > extra flush (next patch) it is faster > > > > For gen2-5, we simply combine the MI_FLUSH into the breadcrumb emission, > > though maybe we could find a solution here to the seqno-vs-interrupt > > issue on Ironlake by mixing up the flush? The answer is no, adding an > > MI_FLUSH before the interrupt is insufficient. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 104 ++++++++++++++++++++++-- > > 1 file changed, 97 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index d5a9edbcf1be..169c54995ca9 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -217,7 +217,7 @@ gen4_render_ring_flush(struct i915_request *rq, u32 mode) > > * really our business. That leaves only stall at scoreboard. > > */ > > static int > > -intel_emit_post_sync_nonzero_flush(struct i915_request *rq) > > +gen6_emit_post_sync_nonzero_flush(struct i915_request *rq) > > { > > u32 scratch_addr = i915_scratch_offset(rq->i915) + 2 * CACHELINE_BYTES; > > u32 *cs; > > @@ -257,7 +257,7 @@ gen6_render_ring_flush(struct i915_request *rq, u32 mode) > > int ret; > > > > /* Force SNB workarounds for PIPE_CONTROL flushes */ > > - ret = intel_emit_post_sync_nonzero_flush(rq); > > + ret = gen6_emit_post_sync_nonzero_flush(rq); > > if (ret) > > return ret; > > > > @@ -300,6 +300,37 @@ gen6_render_ring_flush(struct i915_request *rq, u32 mode) > > return 0; > > } > > > > +static void gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) > > +{ > > + /* First we do the gen6_emit_post_sync_nonzero_flush w/a */ > > + *cs++ = GFX_OP_PIPE_CONTROL(4); > > + *cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD; > > + *cs++ = 0; > > + *cs++ = 0; > > + > > + *cs++ = GFX_OP_PIPE_CONTROL(4); > > + *cs++ = PIPE_CONTROL_QW_WRITE; > > + *cs++ = i915_scratch_offset(rq->i915) | PIPE_CONTROL_GLOBAL_GTT; > > + *cs++ = 0; > > + > > + /* Finally we can flush and with it emit the breadcrumb */ > > + *cs++ = GFX_OP_PIPE_CONTROL(4); > > + *cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | > > + PIPE_CONTROL_DEPTH_CACHE_FLUSH | > > + PIPE_CONTROL_DC_FLUSH_ENABLE | > > + PIPE_CONTROL_QW_WRITE | > > + PIPE_CONTROL_CS_STALL); > > + *cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT; > > + *cs++ = rq->global_seqno; > > + > > + *cs++ = MI_USER_INTERRUPT; > > + *cs++ = MI_NOOP; > > + > > + rq->tail = intel_ring_offset(rq, cs); > > + assert_ring_tail_valid(rq->ring, rq->tail); > > +} > > +static const int gen6_rcs_emit_breadcrumb_sz = 14; > > + > > static int > > gen7_render_ring_cs_stall_wa(struct i915_request *rq) > > { > > @@ -379,6 +410,39 @@ gen7_render_ring_flush(struct i915_request *rq, u32 mode) > > return 0; > > } > > > > +static void gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) > > +{ > > + *cs++ = GFX_OP_PIPE_CONTROL(4); > > + *cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | > > + PIPE_CONTROL_DEPTH_CACHE_FLUSH | > > + PIPE_CONTROL_DC_FLUSH_ENABLE | > > + PIPE_CONTROL_FLUSH_ENABLE | > > + PIPE_CONTROL_QW_WRITE | > > + PIPE_CONTROL_GLOBAL_GTT_IVB | > > + PIPE_CONTROL_CS_STALL); > > There is a faint scent of engine->flush_flags in the air but > I think you favour opencoding everything behind these > indirect calls. Maybe an EMIT_FLUSH | EMIT_DWORD + u32 parameter. I did also at one stage consider replacing the manual emit_breadcrumb_sz with a dummy emission during module init (i.e. emit the tail and cound). That still seems like an improvement as the manual sz is quite easy to break. The reason I stuck with extending ->emit_breadcrumb with an inline flush is that it becomes hairy in the next few patches, and I prefer getting stuck into the mess without too much indirection. But if we do see a way of coalescing, or see a cleaner abstraction, we can always come back. Perhaps though, this might be the last time we need to venture in here? :) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx