Quoting Tvrtko Ursulin (2017-11-21 17:00:00) > > On 21/11/2017 15:24, Chris Wilson wrote: > > Instead of sleeping for a fixed 1ms (roughly, depending on timer slack), > > start with a small sleep and exponentially increase the sleep on each > > cycle. > > > > A good example of a beneficiary is the guc mmio communication channel. > > Typically we expect (and so spin) for 10us for a quick response, but this > > doesn't cover everything and so sometimes we fallback to the millisecond+ > > sleep. This incurs a significant delay in time-critical operations like > > preemption (igt/gem_exec_latency), which can be improved significantly by > > using a small sleep after the spin fails. > > > > We've made this suggestion many times, but had little experimental data > > to support adding the complexity. > > > > References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: John Harrison <John.C.Harrison@xxxxxxxxx> > > Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > > Cc: Ville Syrjala <ville.syrjala@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/intel_drv.h | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h > > index 69aab324aaa1..c1ea9a009eb4 100644 > > --- a/drivers/gpu/drm/i915/intel_drv.h > > +++ b/drivers/gpu/drm/i915/intel_drv.h > > @@ -50,6 +50,7 @@ > > */ > > #define _wait_for(COND, US, W) ({ \ > > unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \ > > + long wait__ = 1; \ > > int ret__; \ > > might_sleep(); \ > > for (;;) { \ > > @@ -62,7 +63,9 @@ > > ret__ = -ETIMEDOUT; \ > > break; \ > > } \ > > - usleep_range((W), (W) * 2); \ > > + usleep_range(wait__, wait__ * 2); \ > > + if (wait__ < (W)) \ > > + wait__ <<= 1; \ > > } \ > > ret__; \ > > }) > > > > I would start the period at 10us since a) <10us is not recommended for > usleep family, b) most callers specify ms timeouts so <10us poll is > perhaps an overkill. Don't forget the majority of the callers are now via wait_for_register and so have that 10us poll prior to the sleeping wait_for(). If there's an ARCH_USLEEP_MIN_WHATNOT that would be useful. > Latency sensitive callers like __intel_wait_for_register_us can be > tweaked at the call site to provide what they want. > > For the actual guc mmio send it sounds like it should pass in 20us to > __intel_wait_for_register_us (referring to John's explanation email) to > cover 99% of the cases. And then the remaining 1% could be fine with a > 10us delay? That it fixed that was a side-effect ;) It just happened to be something that I could measure the latency of in userspace. I'd rather we have something generic that does have a demonstrable improvement. > Otherwise we are effectively making _wait_for partially busy looping, or > whatever the inefficiency in <10us usleep is. I mean, it makes no > practical difference to make a handful of quick loops there but it feels > a bit inelegant. We already do use the hybrid busy-looping for wait_for_register (and everybody is meant to be using wait_for_register, the exceptions should be rare and well justified). The purpose of referencing 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits") was to remind ourselves of the scheme and its benefits. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx