On Tue, Mar 01, 2016 at 10:21:45AM +0000, Tvrtko Ursulin wrote: > > > On 29/02/16 11:59, Tvrtko Ursulin wrote: > > > > > >On 29/02/16 11:48, Chris Wilson wrote: > >>On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote: > >>> > >>> > >>>On 29/02/16 11:13, Chris Wilson wrote: > >>>>On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote: > >>>>> > >>>>>On 29/02/16 10:53, Chris Wilson wrote: > >>>>>>On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote: > >>>>>>>This ok? > >>>>>>> > >>>>>>>""" > >>>>>>>One unexplained result is with "gem_latency -n 0" (dispatching > >>>>>>>empty batches) which shows 5% more throughput, 8% less CPU time, > >>>>>>>25% better producer and consumer latencies, but 15% higher > >>>>>>>dispatch latency which looks like a possible measuring artifact. > >>>>>>>""" > >>>>>> > >>>>>>I doubt it is a measuring artefact since throughput = 1/(dispatch + > >>>>>>latency + test overhead), and the dispatch latency here is larger > >>>>>>than > >>>>>>the wakeup latency and so has greater impact on throughput in this > >>>>>>scenario. > >>>>> > >>>>>I don't follow you, if dispatch latency has larger effect on > >>>>>throughput how to explain the increase and still better throughput? > >>>>> > >>>>>I see in gem_latency this block: > >>>>> > >>>>> measure_latency(p, &p->latency); > >>>>> igt_stats_push(&p->dispatch, *p->last_timestamp - start); > >>>>> > >>>>>measure_latency waits for the batch to complete and then dispatch > >>>>>latency uses p->last_timestamp which is something written by the GPU > >>>>>and not a CPU view of the latency ? > >>>> > >>>>Exactly, measurements are entirely made from the running engine clock > >>>>(which is ~80ns clock, and should be verified during init). The > >>>>register > >>>>is read before dispatch, inside the batch and then at wakeup, but the > >>>>information is presented as dispatch = batch - start and > >>>>wakeup = end - batch, so to get the duration (end - start) we need > >>>>to add the two together. Throughput will also include some overhead > >>>>from > >>>>the test iteration (that will mainly be scheduler interference). > >>>> > >>>>My comment about dispatch having greater effect, is in terms of > >>>>its higher absolute value (so the relative % means a larger change wrt > >>>>throughput). > >>> > >>>Change to this then? > >>> > >>>""" > >>> One unexplained result is with "gem_latency -n 0" (dispatching > >>> empty batches) which shows 5% more throughput, 8% less CPU time, > >>> 25% better producer and consumer latencies, but 15% higher > >>> dispatch latency which looks like an amplified effect of test > >>> overhead. > >>>""" > >> > >>No. Dispatch latency is important and this attempts to pass the change > >>off a test effect when to the best of my knowledge it is a valid external > >>observation of the system. > > > >I just don't understand how can it be valid when we have executed more > >empty batches than before in a unit of time. > > > >Because even dispatch + wake up latency is worse, but throughput is > >still better. > > > >Sounds impossible to me so it must be the effect of using two different > >time sources. CPU side to measure throughput and GPU side to measure > >dispatch latency.I don't know, could you suggest a paragraph to add to > >the commit message so we can close on this? > > Happy with simply leaving out any attempts of explaining the oddity like: > > """ > One odd result is with "gem_latency -n 0" (dispatching empty > batches) which shows 5% more throughput, 8% less CPU time, 25% > better producer and consumer latencies, but 15% higher dispatch > latency which is yet unexplained. > """ Yes! -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx