On 26/12/17 05:32, Sagar Arun Kamble
wrote:
The 2ms drift was on SKL GT4. With the patch above, I'm seeing only a ~40us drift over ~7seconds of recording both perf tracepoints & i915 perf reports. I'm tracking the kernel tracepoints adding gem requests and the i915 perf reports. Here a screenshot at the beginning of the 7s recording : https://i.imgur.com/hnexgjQ.png (you can see the gem request add before the work starts in the i915 perf reports). At the end of the recording, the gem requests appear later than the work in the i915 perf report : https://i.imgur.com/oCd0C9T.png I'll try to prepare some IGT tests that show the drift using perf & i915 perf, so we can run those on different platforms. I tend to mostly test on a SKL GT4 & KBL GT2, but BXT definitely needs more attention... Could we be using it wrong?if we use two changes highlighted above with timecounter maybe we will get same results as your current implementation. Looking at clocks_calc_mult_shift(), it seems clear to me that there is less precision when using timecounter : /* * Find the conversion shift/mult pair which has the best * accuracy and fits the maxsec conversion range: */ On the other hand, there is a performance penalty for doing a div64 for every report. We can probably do better by always computing the clock using the entire delta rather than the accumulated delta.issue is that the reported clock cycles in the OA report is 32bits LSB of GPU TS whereas counter is 36bits. Hence we will need to You're right :) I thought maybe we could do this : Look at teduhe opening period parameter, if it's superior to the period of timestamps wrapping, make sure we schle some work on kernel context to generate a context switch report (like at least once every 6 minutes on gen9). agree on this. Delta ns1-ns0 can be interpreted as max drift.
|
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx