Re: [PATCH] drm/i915: Use exponential backoff for wait_for()

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 21 Nov 2017 20:58:52 +0000

Quoting Sagar Arun Kamble (2017-11-21 16:49:37)
> 
> 
> On 11/21/2017 10:03 PM, Chris Wilson wrote:
> > Quoting Sagar Arun Kamble (2017-11-21 16:29:57)
> >>
> >> On 11/21/2017 8:54 PM, Chris Wilson wrote:
> >>> Instead of sleeping for a fixed 1ms (roughly, depending on timer slack),
> >>> start with a small sleep and exponentially increase the sleep on each
> >>> cycle.
> >>>
> >>> A good example of a beneficiary is the guc mmio communication channel.
> >>> Typically we expect (and so spin) for 10us for a quick response, but this
> >>> doesn't cover everything and so sometimes we fallback to the millisecond+
> >>> sleep. This incurs a significant delay in time-critical operations like
> >>> preemption (igt/gem_exec_latency), which can be improved significantly by
> >>> using a small sleep after the spin fails.
> >>>
> >>> We've made this suggestion many times, but had little experimental data
> >>> to support adding the complexity.
> >>>
> >>> References: 1758b90e38f5 ("drm/i915: Use a hybrid scheme for fast register waits")
> >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >>> Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
> >>> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
> >>> Cc: Ville Syrjala <ville.syrjala@xxxxxxxxxxxxxxx>
> >>> ---
> >>>    drivers/gpu/drm/i915/intel_drv.h | 5 ++++-
> >>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> >>> index 69aab324aaa1..c1ea9a009eb4 100644
> >>> --- a/drivers/gpu/drm/i915/intel_drv.h
> >>> +++ b/drivers/gpu/drm/i915/intel_drv.h
> >>> @@ -50,6 +50,7 @@
> >>>     */
> >>>    #define _wait_for(COND, US, W) ({ \
> >>>        unsigned long timeout__ = jiffies + usecs_to_jiffies(US) + 1;   \
> >>> +     long wait__ = 1;                                                \
> >>>        int ret__;                                                      \
> >>>        might_sleep();                                                  \
> >>>        for (;;) {                                                      \
> >>> @@ -62,7 +63,9 @@
> >>>                        ret__ = -ETIMEDOUT;                             \
> >>>                        break;                                          \
> >>>                }                                                       \
> >>> -             usleep_range((W), (W) * 2);                             \
> >>> +             usleep_range(wait__, wait__ * 2);                       \
> >>> +             if (wait__ < (W))                                       \
> >> This should be "wait__ < (US)" ?
> > US is the total timeout, W is the sleep, now used to provide a cap
> > (minimum frequency of polling).
> Oh. right. I thought we want to exponentially run down the total 
> timeout. This might not be correct approach.
> Hybrid polling might be useful here but we need to precisely know for 
> each use of wait_for the expected wait time.
> Then we could sleep half the total time and then poll :)
> Users needing faster completion should use wait_for_atomic with higher 
> timeout in uS?

The majority of users are expected to use intel_wait_for_register*() so
that we don't have large numbers of this large macro expansion in the
code. There we have put some basic attempt at spin/poll. Doing the wait
for half the expected period then spin would be fun -- except that we
have no idea what the expected period is in most cases :)

If you did have an expected period, then usleep(expected_period >>= 1)
would be a good heuristic indeed.

if (expected) {
	do {
		int tmp;

		if (COND) ...

		tmp = expected >> 1;
		usleep_range(tmp, expected);
		expected = tmp;
	} while (expected);
}
(then the current wait_for loops)

But we would also need to feed that knowledge in from somewhere. (Not a
bad idea, just needs a use case to start the ball rolling. Wait for vblank
might be one?)

Instead our assumption is that mmio completes and is acked immediately,
and backoff from that assumption when it fails.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx