On 01/03/16 10:32, Chris Wilson wrote:
On Tue, Mar 01, 2016 at 10:21:45AM +0000, Tvrtko Ursulin wrote:
On 29/02/16 11:59, Tvrtko Ursulin wrote:
On 29/02/16 11:48, Chris Wilson wrote:
On Mon, Feb 29, 2016 at 11:40:37AM +0000, Tvrtko Ursulin wrote:
On 29/02/16 11:13, Chris Wilson wrote:
On Mon, Feb 29, 2016 at 11:01:49AM +0000, Tvrtko Ursulin wrote:
On 29/02/16 10:53, Chris Wilson wrote:
On Mon, Feb 29, 2016 at 10:45:34AM +0000, Tvrtko Ursulin wrote:
This ok?
"""
One unexplained result is with "gem_latency -n 0" (dispatching
empty batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which looks like a possible measuring artifact.
"""
I doubt it is a measuring artefact since throughput = 1/(dispatch +
latency + test overhead), and the dispatch latency here is larger
than
the wakeup latency and so has greater impact on throughput in this
scenario.
I don't follow you, if dispatch latency has larger effect on
throughput how to explain the increase and still better throughput?
I see in gem_latency this block:
measure_latency(p, &p->latency);
igt_stats_push(&p->dispatch, *p->last_timestamp - start);
measure_latency waits for the batch to complete and then dispatch
latency uses p->last_timestamp which is something written by the GPU
and not a CPU view of the latency ?
Exactly, measurements are entirely made from the running engine clock
(which is ~80ns clock, and should be verified during init). The
register
is read before dispatch, inside the batch and then at wakeup, but the
information is presented as dispatch = batch - start and
wakeup = end - batch, so to get the duration (end - start) we need
to add the two together. Throughput will also include some overhead
from
the test iteration (that will mainly be scheduler interference).
My comment about dispatch having greater effect, is in terms of
its higher absolute value (so the relative % means a larger change wrt
throughput).
Change to this then?
"""
One unexplained result is with "gem_latency -n 0" (dispatching
empty batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which looks like an amplified effect of test
overhead.
"""
No. Dispatch latency is important and this attempts to pass the change
off a test effect when to the best of my knowledge it is a valid external
observation of the system.
I just don't understand how can it be valid when we have executed more
empty batches than before in a unit of time.
Because even dispatch + wake up latency is worse, but throughput is
still better.
Sounds impossible to me so it must be the effect of using two different
time sources. CPU side to measure throughput and GPU side to measure
dispatch latency.I don't know, could you suggest a paragraph to add to
the commit message so we can close on this?
Happy with simply leaving out any attempts of explaining the oddity like:
"""
One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time, 25%
better producer and consumer latencies, but 15% higher dispatch
latency which is yet unexplained.
"""
Yes!
Thanks! Patch merged.
I'll try CSB read outside the execlists lock to see if that helps any
(into a temporary buffer).
What about your patch to move it all to a bottom handler? Are we going
to progress that one?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx