Re: [PATCH] drm/i915: Use exponential backoff for wait_for()

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Tue, 21 Nov 2017 17:00:00 +0000

On 21/11/2017 15:24, Chris Wilson wrote:
Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
start with a small sleep and exponentially increase the sleep on each
cycle.

A good example of a beneficiary is the guc mmio communication channel.
Typically we expect (and so spin) for 10us for a quick response, but this
doesn't cover everything and so sometimes we fallback to the millisecond+
sleep. This incurs a significant delay in time-critical operations like
preemption (igt/gem_exec_latency), which can be improved significantly by
using a small sleep after the spin fails.

We've made this suggestion many times, but had little experimental data
to support adding the complexity.

References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
Cc: Ville Syrjala <ville.syrjala@xxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 69aab324aaa1..c1ea9a009eb4 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -50,6 +50,7 @@
   */
  #define _wait_for(COND, US, W) ({ \
  	unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;	\
+	long wait__ = 1;						\
  	int ret__;							\
  	might_sleep();							\
  	for (;;) {							\
@@ -62,7 +63,9 @@
  			ret__ = -ETIMEDOUT;				\
  			break;						\
  		}							\
-		usleep_range((W), (W) * 2);				\
+		usleep_range(wait__, wait__ * 2);			\
+		if (wait__ < (W))					\
+			wait__ <<= 1;					\
  	}								\
  	ret__;								\
  })


I would start the period at 10us since a) <10us is not recommended for 
usleep family, b) most callers specify ms timeouts so <10us poll is 
perhaps an overkill.

Latency sensitive callers like __intel_wait_for_register_us can be 
tweaked at the call site to provide what they want.

For the actual guc mmio send it sounds like it should pass in 20us to 
__intel_wait_for_register_us (referring to John's explanation email) to 
cover 99% of the cases. And then the remaining 1% could be fine with a 
10us delay?

Otherwise we are effectively making _wait_for partially busy looping, or 
whatever the inefficiency in <10us usleep is. I mean, it makes no 
practical difference to make a handful of quick loops there but it feels 
a bit inelegant.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx