On Mon, Nov 16, 2015 at 12:08:08PM +0000, Tvrtko Ursulin wrote: > > On 16/11/15 11:12, Chris Wilson wrote: > >On Mon, Nov 16, 2015 at 10:24:45AM +0000, Tvrtko Ursulin wrote: > >>On 15/11/15 13:32, Chris Wilson wrote: > >>>+static u64 local_clock_us(unsigned *cpu) > >>>+{ > >>>+ u64 t; > >>>+ > >>>+ *cpu = get_cpu(); > >>>+ t = local_clock() >> 10; > >> > >>Needs comment I think to explicitly mention the approximation, or > >>maybe drop the _us suffix? > > > >I did consider _approx_us but thought that was overkill. A comment along > >the lines of > >/* Approximately convert ns to us - the error is less than the > > * truncation! > > */ > > And the result is not used in subsequent calculations apart from > comparing against an approximate timeout? Exactly, the timeout is fairly arbitrary and defined in the same units. That we truncate is a much bigger cause for concern in terms of spinning accurately for a definite length of time. > >>>@@ -1161,7 +1183,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) > >>> if (signal_pending_state(state, current)) > >>> break; > >>> > >>>- if (time_after_eq(jiffies, timeout)) > >>>+ if (busywait_stop(timeout, cpu)) > >>> break; > >>> > >>> cpu_relax_lowlatency(); > >>> > >> > >>Otherwise looks good. Not sure what would you convert to 32-bit from > >>your follow up reply since you need us resolution? > > > >s/u64/unsigned long/ s/time_after64/time_after/ > > > >32bits of us resolution gives us 1000s before wraparound between the two > >samples. And I hope that a 1000s doesn't pass between loops. Or if it does, > >the GPU managed to complete its task. > > Now I see that you did say low bits.. yes that sounds fine. > > Btw while you are optimizing things maybe pick up this micro > optimization: http://patchwork.freedesktop.org/patch/64339/ > > Not in scope of this thread but under the normal development patch flow. There's a different series which looks at tackling the scalabiltiy issue with dozens of concurrent waiters. I have an equivalent patch there and one to tidy up the seqno query. > Btw2, any benchmark result changes with this? Spinning still gives the dramatic (2x) improvement in the microbenchmarks (over pure interrupt driven waits), so that improvement is preserved. There are a couple of interesting swings in the macro tests (comparedt to the previous jiffie patch) just above the noise level which could well be a change in the throttling/scheduling. (And those tests are also the ones that correspond to the greatest gains (10-40%) using spinning.) -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx