On 12/21/2017 1:34 AM, Tvrtko Ursulin wrote:
On 20/12/2017 23:50, John Harrison wrote:
On 12/20/2017 1:54 AM, Tvrtko Ursulin wrote:
What was the effect of all this on your big traces? I am only
testing with a smaller one which goes from ~3.3s to ~2.2s. On a
larger trace it might be non-linear gains due to double sort
avoidance, unless there will be some other effects to cancel that out.
So with a trace of a shortened gem_exec_nop/basic_sequential, the
'perf script' output is 439MB and the original trace.pl before any of
the changes took ~180s. After the 'auto-detect field order' patch, it
went up to ~201s. With the optimisation patch it is down to ~129s.
However, I am also seeing differences in the HTML output since the
optimisation patch. The differences aren't massive, just slight
variations in the times. The structure is all the same, its just that
the accounting and/or time stamps are out.
For example:
{id: 1, content: 'Ring0<br><small><br>79.48% idle<br><br>34.32%
busy<br>584.97% runnable<br>2103.60% queued<br><br>16.18%
wait<br><br>200931 batches<br>331.28us avg batch<br>331.38us avg
engine batch<br></small>'},
vs
{id: 1, content: 'Ring0<br><small><br>79.48% idle<br><br>34.32%
busy<br>584.97% runnable<br>2103.60% queued<br><br>16.18%
wait<br><br>200931 batches<br>338.56us avg batch<br>338.56us avg
engine batch<br></small>'},
Or:
{id: 58, key: -210383407, type: 'range', group: 4, subgroup: 2,
subgroupOrder: 3, content: '428/3 <small>0</small>
<small><i>???</i></small> <small><i>++</i></small> <br>142us
<small>(0us)</small>', start: '2017-01-05 21:27:45.352968', end:
'2017-01-05 21:27:45.353110', style: 'color: white; background-color:
red;'},
vs
{id: 58, key: -210383407, type: 'range', group: 4, subgroup: 2,
subgroupOrder: 3, content: '428/3 <small>0</small>
<small><i>???</i></small> <small><i>++</i></small> <br>159us
<small>(0us)</small>', start: '2017-01-05 21:27:45.352968', end:
'2017-01-05 21:27:45.353127', style: 'color: white; background-color:
red;'},
I can send you the full output if it is useful and the source logs
too. The HTML output is about 840KB but as noted, the perf logs are
hundreds of MBs.
I was able to reproduce it. I think it's down to floating point
mishandling. Neither the original nor the optimized version were
correct and both are accumulating error. I'll send an updated patch
shortly.
Regards,
Tvrtko
The new version with +0.5 gives me exactly the same results as the
previous edition without the rounding. If I get rid of the int()
conversion too, I still get the same results.
What does make a difference is if I fix up the broken processing of
notify traces:
diff --git a/scripts/trace.pl b/scripts/trace.pl
index 8dbd497b..20ae3c78 100755
--- a/scripts/trace.pl
+++ b/scripts/trace.pl
@@ -382,14 +382,16 @@ while (<>) {
next if exists $tp{'ring'} and exists $ignore_ring{$tp{'ring'}};
- if (exists $tp{'ring'} and exists $tp{'ctx'} and exists
$tp{'seqno'}) {
+ if (exists $tp{'ring'} and exists $tp{'seqno'}) {
$ring = $tp{'ring'};
- $ctx = $tp{'ctx'};
$seqno = $tp{'seqno'};
- $orig_ctx = $ctx;
- $ctx = sanitize_ctx($ctx, $ring);
- $key = db_key($ring, $ctx, $seqno);
+ if (exists $tp{'ctx'}) {
+ $ctx = $tp{'ctx'};
+ $orig_ctx = $ctx;
+ $ctx = sanitize_ctx($ctx, $ring);
+ $key = db_key($ring, $ctx, $seqno);
+ }
}
if ($tp_name eq 'i915:i915_gem_request_wait_begin:') {
@@ -461,7 +463,7 @@ while (<>) {
}
$db{$key}->{'duration'} = $db{$key}->{'notify'} -
$db{$key}->{'start'};
$db{$key}->{'context-complete-delay'} =
$db{$key}->{'end'} - $db{$key}->{'notify'};
- } elsif ($tp_name eq 'intel_engine_notify:') {
+ } elsif ($tp_name eq 'i915:intel_engine_notify:') {
$notify{global_key($ring, $seqno)} = $time;
} elsif ($tp_name eq 'i915:intel_gpu_freq_change:') {
push @freqs, [$prev_freq_ts, $time, $prev_freq] if
$prev_freq;
With that, I get exactly the same HTML page as before the optimisation
patch.
John.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx