On 21/11/2017 15:24, Chris Wilson wrote:
Instead of sleeping for a fixed 1ms (roughly, depending on timer slack), start with a small sleep and exponentially increase the sleep on each cycle. A good example of a beneficiary is the guc mmio communication channel. Typically we expect (and so spin) for 10us for a quick response, but this doesn't cover everything and so sometimes we fallback to the millisecond+ sleep. This incurs a significant delay in time-critical operations like preemption (igt/gem_exec_latency), which can be improved significantly by using a small sleep after the spin fails. We've made this suggestion many times, but had little experimental data to support adding the complexity. References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits") Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Cc: John Harrison <John.C.Harrison@xxxxxxxxx> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> Cc: Ville Syrjala <ville.syrjala@xxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/intel_drv.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 69aab324aaa1..c1ea9a009eb4 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -50,6 +50,7 @@ */ #define _wait_for(COND, US, W) ({ \ unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1; \ + long wait__ = 1; \ int ret__; \ might_sleep(); \ for (;;) { \ @@ -62,7 +63,9 @@ ret__ = -ETIMEDOUT; \ break; \ } \ - usleep_range((W), (W) * 2); \ + usleep_range(wait__, wait__ * 2); \ + if (wait__ < (W)) \ + wait__ <<= 1; \ } \ ret__; \ })
I would start the period at 10us since a) <10us is not recommended for usleep family, b) most callers specify ms timeouts so <10us poll is perhaps an overkill.
Latency sensitive callers like __intel_wait_for_register_us can be tweaked at the call site to provide what they want.
For the actual guc mmio send it sounds like it should pass in 20us to __intel_wait_for_register_us (referring to John's explanation email) to cover 99% of the cases. And then the remaining 1% could be fine with a 10us delay?
Otherwise we are effectively making _wait_for partially busy looping, or whatever the inefficiency in <10us usleep is. I mean, it makes no practical difference to make a handful of quick loops there but it feels a bit inelegant.
Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx