On 12/28/2017 10:43 PM, Lionel
Landwerlin wrote:
On 26/12/17 05:32, Sagar Arun Kamble
wrote:
On 12/22/2017 3:46 PM, Lionel
Landwerlin wrote:
On 22/12/17 09:30, Sagar Arun
Kamble wrote:
On 12/21/2017 6:29 PM, Lionel
Landwerlin wrote:
I see two imp. changes 1. approximation of start time during
init_timecounter 2. overflow handling in delta accumulation.
With these incorporated, I guess timecounter should also
work in same fashion.
I think the arithmetic in timecounter is inherently lossy and
that's why we're seeing a drift.
Could you share details about platform, scenario in which 2ms
drift per second is being seen with timecounter.
I did not observe this on SKL.
The 2ms drift was on SKL GT4.
I have checked the timecounter arithmetic. Accuracy is very high (of
the order of micro ns per tick).
I interpreted maxsec parameter in calculation of mult/shift using
clocks_calc_mult_shift function as total time covered by counter
but actually it controls the conversion accuracy. Since we want best
possible accuracy passing zero should be preferred there.
For instance below are the mult/shift values and time reported for
10 minutes with these values for SKL GT2 12mhz.
As you can see drift due to calculation is only about 2us. We should
check by passing zero to clocks_calc_mult_shift and
delta handling new added with timecounter on SKL GT4. 2ms is huge
drift and it is very unlikely related to these calculations.
maxsec, mult, shift, tick time (mult/2^shift), total time
(10*60*12000000 * tick time), drift due to calculation
0, 2796202667, 25, 83.33333334326, 600,000,000,071.525, 71ns
3000, 174762667, 21, 83.33333349227, 600,000,001,144.409, 1144ns
6000, 87381333, 20, 83.33333301544, 599,999,997,711.181, 2289ns
With
the patch above, I'm seeing only a ~40us drift over ~7seconds of
recording both perf tracepoints & i915 perf reports.
I'm tracking the kernel tracepoints adding gem requests and the
i915 perf reports.
Here a screenshot at the beginning of the 7s recording : https://i.imgur.com/hnexgjQ.png
(you can see the gem request add before the work starts in the
i915 perf reports).
At the end of the recording, the gem requests appear later than
the work in the i915 perf report : https://i.imgur.com/oCd0C9T.png
Looks like we need to have error margin of only few microseconds :)
I'll
try to prepare some IGT tests that show the drift using perf &
i915 perf, so we can run those on different platforms.
I tend to mostly test on a SKL GT4 & KBL GT2, but BXT
definitely needs more attention...
Could we be using it wrong?
if we use two changes highlighted above with timecounter maybe
we will get same results as your current implementation.
In
the patch above, I think there is still a drift because of the
potential fractional part loss at every delta we add.
But it should only be a fraction of a nanosecond multiplied by
the number of reports over a period of time.
With a report every 1us, that should still be much less than a
1ms of drift over 1s.
timecounter interface takes care of fractional parts so that
should help us.
we can either go with timecounter or our own implementation
provided conversions are precise.
Looking at clocks_calc_mult_shift(), it seems clear to me that
there is less precision when using timecounter :
/*
* Find the conversion shift/mult pair which has the best
* accuracy and fits the maxsec conversion range:
*/
We can improve upon this by passing zero as maxsec to
clocks_calc_mult_shift.
On the
other hand, there is a performance penalty for doing a div64 for
every report.
We
can probably do better by always computing the clock using the
entire delta rather than the accumulated delta.
issue is that the reported clock cycles in the OA report is
32bits LSB of GPU TS whereas counter is 36bits. Hence we will
need to
accumulate the delta. ofc there is assumption that two reports
can't be spaced with count value of 0xffffffff apart.
You're right :)
I thought maybe we could do this :
Look at teduhe opening period parameter, if it's superior to the
period of timestamps wrapping, make sure we schle some work on
kernel context to generate a context switch report (like at least
once every 6 minutes on gen9).
Looks fine to me.
agree on this. Delta ns1-ns0 can be interpreted as max
drift.
Measurement on my KBL system
were in the order of a few microseconds (~30us).
I guess we might be able to setup the correlation point
better (masking interruption?) to reduce the delta.
already using spin_lock. Do you mean NMI?
I don't actually know much on this point.
if spin_lock is the best we can do, then that's it :)
Thanks,
-
Lionel
On 07/12/17 00:57, Robert Bragg wrote:
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
|