Quoting Kamble, Sagar A (2017-09-15 10:15:31) > Thanks Chris. LGTM. > > Minor inputs below > > > On 9/14/2017 3:28 PM, Chris Wilson wrote: > > An interesting discussion regarding "hybrid interrupt polling" for NVMe > > came to the conclusion that the ideal busyspin before sleeping was half > > of the expected request latency (and better if it was already halfway > > through that request). This suggested that we too should look again at > > our tradeoff between spinning and waiting. Currently, our spin simply > > tries to hide the cost of enabling the interrupt, which is good to avoid > > penalising nop requests (i.e. test throughput) and not much else. > > Studying real world workloads suggests that a spin of upto 500us can > > dramatically boost performance, but the suggestion is that this is not > > from avoiding interrupt latency per-se, but from secondary effects of > > sleeping such as allowing the CPU reduce cstate and context switch away. > > To offset those costs from penalising the active client, bump the initial > > spin somewhat to 250us and the secondary spin to 20us to balance the cost > > of another context switch following the interrupt. > > > > Suggested-by: Sagar Kamble <sagar.a.kamble@xxxxxxxxx> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Sagar Kamble <sagar.a.kamble@xxxxxxxxx> > > Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Ben Widawsky <ben@xxxxxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_gem_request.c | 25 +++++++++++++++++++++---- > > 1 file changed, 21 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > > index 813a3b546d6e..ccbdaf6a0e4d 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_request.c > > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > > @@ -1155,8 +1155,20 @@ long i915_wait_request(struct drm_i915_gem_request *req, > > GEM_BUG_ON(!intel_wait_has_seqno(&wait)); > > GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit)); > > > > - /* Optimistic short spin before touching IRQs */ > > - if (i915_spin_request(req, state, 5)) > > + /* Optimistic short spin before touching IRQs. > > + * > > + * We use a rather large value here to offset the penalty of switching > > + * away from the active task. Frequently, the client will wait upon > > + * an old swapbuffer to throttle itself to remain within a frame of > > + * the gpu. If the client is running in lockstep with the gpu, then > > + * it should not be waiting long at all, and a sleep now will incur > > + * extra scheduler latency in producing the next frame. So we sleep > Typo? Should this be "So we spin for longer" > > + * for longer to try and keep the client running. > > + * > > + * We need ~5us to enable the irq, ~20us to hide a context switch, > > + * we use 250us to keep the cache hot. > > + */ > > + if (i915_spin_request(req, state, 250)) > Hybrid polling introduces sleep of 1/2 request duration prior to poll of > 1/2 request duration. > Should we reorder here to achieve the same? No, we are not tracking estimated time of completion for a request. We don't know when the request was started on the hw, nor when it is likely to complete. Now we could bookend the batch with TIMESTAMPs and with a lot of extra work track an ewma per context per engine, but even with some magic, I don't think that will work across preemption. Yes, it makes a lot of sense not to spin if we don't expect the request to complete. The closest we have at present (and currently use) is merely a flag saying whether the request is currently executing. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx