Quoting Sagar Arun Kamble (2017-11-27 07:20:01) > > > On 11/26/2017 5:50 PM, Chris Wilson wrote: > > An interesting discussion regarding "hybrid interrupt polling" for NVMe > > came to the conclusion that the ideal busyspin before sleeping > > I think hybrid approach suggests sleep (1/2 duration) and then busyspin (1/2 duration) > (for small I/O size), so this should be "busyspin after sleeping" although we are not doing exactly same. It does, we are not. For reasons I thought I had described ... But not here apparently. Differences between hybrid interrupt polling and ourselves we should include in the comments as well. > > was half > > of the expected request latency (and better if it was already halfway > > through that request). This suggested that we too should look again at > > our tradeoff between spinning and waiting. Currently, our spin simply > > tries to hide the cost of enabling the interrupt, which is good to avoid > > penalising nop requests (i.e. test throughput) and not much else. > > Studying real world workloads suggests that a spin of upto 500us can > > dramatically boost performance, but the suggestion is that this is not > > from avoiding interrupt latency per-se, but from secondary effects of > > sleeping such as allowing the CPU reduce cstate and context switch away. > > > > v2: Expose the spin setting via Kconfig options for easier adjustment > > and testing. > > v3: Don't get caught sneaking in a change to the busyspin parameters. > > > > Suggested-by: Sagar Kamble <sagar.a.kamble@xxxxxxxxx> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Sagar Kamble <sagar.a.kamble@xxxxxxxxx> > > Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Ben Widawsky <ben@xxxxxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/Kconfig | 6 ++++++ > > drivers/gpu/drm/i915/Kconfig.profile | 23 +++++++++++++++++++++++ > > drivers/gpu/drm/i915/i915_gem_request.c | 28 ++++++++++++++++++++++++---- > > 3 files changed, 53 insertions(+), 4 deletions(-) > > create mode 100644 drivers/gpu/drm/i915/Kconfig.profile > > > > diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig > > index dfd95889f4b7..eae90783f8f9 100644 > > --- a/drivers/gpu/drm/i915/Kconfig > > +++ b/drivers/gpu/drm/i915/Kconfig > > @@ -131,3 +131,9 @@ depends on DRM_I915 > > depends on EXPERT > > source drivers/gpu/drm/i915/Kconfig.debug > > endmenu > > + > > +menu "drm/i915 Profile Guided Optimisation" > > + visible if EXPERT > > + depends on DRM_I915 > > + source drivers/gpu/drm/i915/Kconfig.profile > > +endmenu > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile > > new file mode 100644 > > index 000000000000..a1aed0e2aad5 > > --- /dev/null > > +++ b/drivers/gpu/drm/i915/Kconfig.profile > > @@ -0,0 +1,23 @@ > > +config DRM_I915_SPIN_REQUEST_IRQ > > + int > > + default 5 # microseconds > > + help > > + Before sleeping waiting for a request (GPU operation) to complete, > > + we may spend some time polling for its completion. As the IRQ may > > + take a non-negligible time to setup, we do a short spin first to > > + check if the request will complete quickly. > > + > > + May be 0 to disable the initial spin. > > + > > +config DRM_I915_SPIN_REQUEST_CS > > + int > > + default 2 # microseconds > > + help > > + After sleeping for a request (GPU operation) to complete, we will > > + be woken up on the completion of every request prior to the one > > + being waited on. For very short requests, going back to sleep and > > + be woken up again may add considerably to the wakeup latency. To > > + avoid incurring extra latency from the scheduler, we may choose to > > + spin prior to sleeping again. > > + > > + May be 0 to disable spinning after being woken. > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > > index a90bdd26571f..7ac72a0a949c 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_request.c > > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > > @@ -1198,8 +1198,21 @@ long i915_wait_request(struct drm_i915_gem_request *req, > > GEM_BUG_ON(!intel_wait_has_seqno(&wait)); > > GEM_BUG_ON(!i915_sw_fence_signaled(&req->submit)); > > > > - /* Optimistic short spin before touching IRQs */ > > - if (__i915_spin_request(req, wait.seqno, state, 5)) > > + /* Optimistic spin before touching IRQs. > > + * > > + * We may use a rather large value here to offset the penalty of > > + * switching away from the active task. Frequently, the client will > > + * wait upon an old swapbuffer to throttle itself to remain within a > > + * frame of the gpu. If the client is running in lockstep with the gpu, > > + * then it should not be waiting long at all, and a sleep now will incur > > + * extra scheduler latency in producing the next frame. So we sleep > > + * for longer to try and keep the client running. > > + * > > + * We need ~5us to enable the irq, ~20us to hide a context switch. > > This comment fits more with next patch. It just a general overview of the short of timescales we expect. I want to explain the rationale behind having a spin and what spins we have in mind. Maybe if I just add "upto" -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx