On Tue, Nov 21, 2017 at 08:59:51PM +0000, Chris Wilson wrote: > Instead of sleeping for a fixed 1ms (roughly, depending on timer slack), > start with a small sleep and exponentially increase the sleep on each > cycle. > > A good example of a beneficiary is the guc mmio communication channel. > Typically we expect (and so spin) for 10us for a quick response, but this > doesn't cover everything and so sometimes we fallback to the millisecond+ > sleep. This incurs a significant delay in time-critical operations like > preemption (igt/gem_exec_latency), which can be improved significantly by > using a small sleep after the spin fails. > > We've made this suggestion many times, but had little experimental data > to support adding the complexity. > > v2: Bump the minimum usleep to 10us on advice of > Documentation/timers/timers-howto.txt (Tvrko) > v3: Specify min, max range for usleep intervals -- some code may > crucially depend upon and so want to specify the sleep pattern. Since we see the effects for GuC preeption, let's gather some evidence. (SKL) intel_guc_send_mmio latency: 100 rounds of gem_exec_latency --r '*-preemption' drm-tip: usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 44 | | 16 -> 31 : 1088 | | 32 -> 63 : 832 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 12 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 29899 |********* | 2048 -> 4095 : 131033 |****************************************| exponential backoff: usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 58 | | 16 -> 31 : 2009 | | 32 -> 63 : 153615 |****************************************| 64 -> 127 : 4360 |* | 128 -> 255 : 14 | | 256 -> 511 : 4236 |* | 512 -> 1023 : 4 | | s/10us/100us in __intel_wait_for_register_fw fast_timeout (without exponential backoff): usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 946 | | 16 -> 31 : 159803 |****************************************| 32 -> 63 : 3440 | | 64 -> 127 : 33 | | 128 -> 255 : 1 | | 256 -> 511 : 74 | | same as the previous one, except with lower GPU clock, let's say max == hw_min: usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 26 | | 32 -> 63 : 160411 |****************************************| 64 -> 127 : 3402 | | 128 -> 255 : 12 | | 256 -> 511 : 50 | | 512 -> 1023 : 3 | | 1024 -> 2047 : 1 | | 2048 -> 4095 : 2 | | We're waaay better with exponential backoff compared with current drm-tip, and we're almost as good as with spinning for a longer period. Unfortunately with GuC the delay depends on platform (and/or GPU clock?), so we're still seeing the long tail without exponential backoff if we manage to miss the fast timeout. > References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits") > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: John Harrison <John.C.Harrison@xxxxxxxxx> > Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > Cc: Ville Syrjala <ville.syrjala@xxxxxxxxxxxxxxx> Reviewed-by: Michał Winiarski <michal.winiarski@xxxxxxxxx> -Michał > --- > drivers/gpu/drm/i915/intel_drv.h | 11 +++++++---- > drivers/gpu/drm/i915/intel_pm.c | 2 +- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h > index 635a96fcd788..c00441a3d649 100644 > --- a/drivers/gpu/drm/i915/intel_drv.h > +++ b/drivers/gpu/drm/i915/intel_drv.h > @@ -48,8 +48,9 @@ > * having timed out, since the timeout could be due to preemption or similar and > * we've never had a chance to check the condition before the timeout. > */ > -#define _wait_for(COND, US, W) ({ \ > +#define _wait_for(COND, US, Wmin, Wmax) ({ \ > unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \ > + long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \ > int ret__; \ > might_sleep(); \ > for (;;) { \ > @@ -62,12 +63,14 @@ > ret__ = -ETIMEDOUT; \ > break; \ > } \ > - usleep_range((W), (W) * 2); \ > + usleep_range(wait__, wait__ * 2); \ > + if (wait__ < (Wmax)) \ > + wait__ <<= 1; \ > } \ > ret__; \ > }) > > -#define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 1000) > +#define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 10, 1000) > > /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */ > #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) > @@ -116,7 +119,7 @@ > int ret__; \ > BUILD_BUG_ON(!__builtin_constant_p(US)); \ > if ((US) > 10) \ > - ret__ = _wait_for((COND), (US), 10); \ > + ret__ = _wait_for((COND), (US), 10, 10); \ > else \ > ret__ = _wait_for_atomic((COND), (US), 0); \ > ret__; \ > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index e445ec174831..f07f14ae198d 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -9294,7 +9294,7 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request, > ret = 0; > goto out; > } > - ret = _wait_for(COND, timeout_base_ms * 1000, 10); > + ret = _wait_for(COND, timeout_base_ms * 1000, 10, 10); > if (!ret) > goto out; > > -- > 2.15.0 > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx