Re: [PATCH v4] drm/i915: Optimistically spin for the request completion

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Fri, 20 Mar 2015 16:19:02 +0000

On Fri, Mar 20, 2015 at 04:01:52PM +0000, Tvrtko Ursulin wrote:
> 
> On 03/20/2015 02:36 PM, Chris Wilson wrote:
> >This provides a nice boost to mesa in swap bound scenarios (as mesa
> >throttles itself to the previous frame and given the scenario that will
> >complete shortly). It will also provide a good boost to systems running
> >with semaphores disabled and so frequently waiting on the GPU as it
> >switches rings. In the most favourable of microbenchmarks, this can
> >increase performance by around 15% - though in practice improvements
> >will be marginal and rarely noticeable.
> >
> >v2: Account for user timeouts
> >v3: Limit the spinning to a single jiffie (~1us) at most. On an
> >otherwise idle system, there is no scheduler contention and so without a
> >limit we would spin until the GPU is ready.
> >v4: Drop forcewake - the lazy coherent access doesn't require it, and we
> >have no reason to believe that the forcewake itself improves seqno
> >coherency - it only adds delay.
> >
> >Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
> >Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx>
> >Cc: "Rantala, Valtteri" <valtteri.rantala@xxxxxxxxx>
> 
> Still against a toggle switch like a simple module parameter?

Yes. I'd much rather tackle the corner cases than ignore them.

> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 44 +++++++++++++++++++++++++++++++++++------
> >  1 file changed, 38 insertions(+), 6 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 2e17a254aac1..9988e65c1440 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -1181,6 +1181,29 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
> >  	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
> >  }
> >
> >+static int __i915_spin_request(struct drm_i915_gem_request *rq)
> >+{
> >+	unsigned long timeout;
> >+
> >+	if (i915_gem_request_get_ring(rq)->irq_refcount)
> >+		return -EBUSY;
> 
> So if someone else is already waiting on this ring skip the spin and
> do the sleep-wait.
> 
> That would mean earlier waiter didn't manage to spin to completion
> so for subsequent ones does it make sense to try to spin? If we
> assume waiters are arriving here in submission order then no, they
> should proceed to sleep-wait. But if waiters are arriving here in
> random order, and that is purely up to userspace I think, then I am
> not sure?

They arrive pretty much in random order.

> On the other hand if we allowed this "out-of-order waiters" that
> would potentially be too much spinning so maybe it is better like it
> is.

Also part of my premise is that it the cost of the irq (setting it up and
handling all the intermidate ones) that is the crux of the issue. Once
we have enabled the irq for one, we may as well then use it for the
herd. Also with a herd we will want to err on the side of sleeping more,
or so my intuition says.

I guess one test would be to see how many 1x1 [xN overdraw, say 1x1
Window, but rendering internally at 1080p] clients we can run in
parallel whilst hitting 60fps. And then whether allowing multiple
spinners helps or hinders.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx