On Fri, Dec 18, 2015 at 11:59:41AM +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > We can rely on context complete interrupt to wake up the waiters > apart in the case where requests are merged into a single ELSP > submission. In this case we inject MI_USER_INTERRUPTS in the > ring buffer to ensure prompt wake-ups. > > This optimization has the effect on for example GLBenchmark > Egypt off-screen test of decreasing the number of generated > interrupts per second by a factor of two, and context switched > by factor of five to six. I half like it. Are the interupts a limiting factor in this case though? This should be ~100 waits/second with ~1000 batches/second, right? What is the delay between request completion and client wakeup - difficult to measure after you remove the user interrupt though! But I estimate it should be on the order of just a few GPU cycles. > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 27f06198a51e..d9be878dbde7 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -359,6 +359,13 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0, > spin_unlock(&dev_priv->uncore.lock); > } > > +static void execlists_emit_user_interrupt(struct drm_i915_gem_request *req) > +{ > + struct intel_ringbuffer *ringbuf = req->ringbuf; > + > + iowrite32(MI_USER_INTERRUPT, ringbuf->virtual_start + req->tail - 8); > +} > + > static int execlists_update_context(struct drm_i915_gem_request *rq) > { > struct intel_engine_cs *ring = rq->ring; > @@ -433,6 +440,12 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring) > cursor->elsp_submitted = req0->elsp_submitted; > list_move_tail(&req0->execlist_link, > &ring->execlist_retired_req_list); > + /* > + * When merging requests make sure there is still > + * something after each batch buffer to wake up waiters. > + */ > + if (cursor != req0) > + execlists_emit_user_interrupt(req0); You may have already missed this instruction as you patch it, and keep doing so as long as the context is resubmitted. I think to be safe, you need to patch cursor as well. You could then MI_NOOP out the MI_INTERUPT on the terminal request. An interesting igt experiement I think would be: thread A, keep queuing batches with just a single MI_STORE_DWORD_IMM *addr thread B, waits on batch from A, reads *addr (asynchronously), measures latency (actual value - expected(batch)) Run for 10s, report min/max/median latency. Repeat for more threads/contexts and more waiters. Ah, that may be the demonstration for the thundering herd I've been looking for! -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx