Quoting Tvrtko Ursulin (2018-12-31 10:43:07) > > On 28/12/2018 17:16, Chris Wilson wrote: > > The irq_seqno_barrier is a tradeoff between doing work on every request > > (on the GPU) and doing work after every interrupt (on the CPU). We > > presume we have many more requests than interrupts! However, the current > > w/a for Ivybridge is an implicit delay that currently fails sporadically > > and consistently if we move the w/a into the irq handler itself. This > > makes the CPU barrier untenable for upcoming interrupt handler changes > > and so we need to replace it with a delay on the GPU before we send the > > MI_USER_INTERRUPT. As it turns out that delay is 32x MI_STORE_DWORD_IMM, > > or about 0.6us per request! Quite nasty, but the lesser of two evils > > looking to the future. > > > > Testcase: igt/gem_sync > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++++++++----------- > > 1 file changed, 44 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 2fb3a364c390..dd996103d495 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -443,6 +443,34 @@ static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) > > } > > static const int gen6_xcs_emit_breadcrumb_sz = 4; > > > > +#define GEN7_XCS_WA 32 > > +static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs) > > +{ > > + int i; > > + > > + *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW; > > + *cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT; > > + *cs++ = rq->global_seqno; > > + > > + for (i = 0; i < GEN7_XCS_WA; i++) { > > + *cs++ = MI_STORE_DWORD_INDEX; > > + *cs++ = I915_GEM_HWS_INDEX_ADDR; > > + *cs++ = rq->global_seqno; > > + } > > + > > + *cs++ = MI_FLUSH_DW; > > + *cs++ = 0; > > + *cs++ = 0; > > + > > + *cs++ = MI_USER_INTERRUPT; > > + *cs++ = MI_NOOP; > > + > > + rq->tail = intel_ring_offset(rq, cs); > > + assert_ring_tail_valid(rq->ring, rq->tail); > > +} > > +static const int gen7_xcs_emit_breadcrumb_sz = 8 + GEN7_XCS_WA * 3; > > Is it worth moving this before the function, and then in the function > have a little GEM_BUG_ON using cs pointer arithmetic to check the two > sizes match? Or even simpler the function could use > engine->emit_breadcrumb_sz to check against. I have a sketch of an idea to build a request during intel_engine_setup_common() and measure the actual breadcrumb_sz. Which I think is the best way to remove the fragile counting by hand. (But it required more than 5 minutes of mocking around, so it ended up on the list of things to do later.) Later on, we do add a check that the emit_breadcrumb matches the breadcrumb_sz when we move the emission into i915_request_add() and can rely on ring->emit being valid. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx