Re: [PATCH] drm/i915: Optimistically spin for the request completion

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 11 Mar 2015 10:30:13 +0000

On Wed, Mar 11, 2015 at 11:13:59AM +0100, Daniel Vetter wrote:
> On Tue, Mar 10, 2015 at 04:14:14PM +0000, Chris Wilson wrote:
> > On Tue, Mar 10, 2015 at 04:06:19PM +0000, Chris Wilson wrote:
> > > @@ -1235,12 +1257,20 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> > >  	if (ring->id == RCS && INTEL_INFO(dev)->gen >= 6)
> > >  		gen6_rps_boost(dev_priv, file_priv);
> > >  
> > > -	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
> > > -		return -ENODEV;
> > > -
> > >  	/* Record current time in case interrupted by signal, or wedged */
> > >  	trace_i915_gem_request_wait_begin(req);
> > >  	before = ktime_get_raw_ns();
> > > +
> > > +	/* Optimistic spin before touching IRQs */
> > Perhaps iff timeout == NULL, or pass it along and add a
> > 
> > if (timeout && timeout_after_eq(jiffies, timeout))
> > 	break;
> > 
> > before the cpu_relax()?
> 
> I guess the answer for that is asking how many apps use short
> opportunistic waits in the frame rendering loop, e.g. to wait for query
> results and fall back to the ones from the previous frame if they're not
> available?

My presumption is that this would help most with occlusion query bound
applications (in terms of real world impact). Applications that have
readback in their critical path aren't really that exciting...

I was trying to see if cities skylines would benefit... But that seems
broken with multimonitor setups :|

> Also do you have microbenchmark numbers for something midly ridiculous
> like a loop of very short batches (enough ofc to cause a bit of delay) and
> immediately stalling for them? It's definitely an awesome idea given that
> every other lock and sync primitive does it too.

Urm, you are describing exactly how mesa behaves in swap benchmarks.
Admittedly there isn't much room for improvement after the throttle
adjustment patches land in mesa, but it is still there. It does increase
CPU load greatly in memory bound swap benchmarks, and I wonder how much
of the performance increase is from keeping the CPU from going to sleep
(i.e.  preventing cpufreq from destroying the benchmark). I guess I have
an exciting morning of letting synmark run on one machine in various
configs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx