Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/7/2017 6:18 AM, Robert Bragg wrote:


On Wed, Nov 15, 2017 at 12:13 PM, Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx> wrote:
We can compute system time corresponding to GPU timestamp by taking a
reference point (CPU monotonic time, GPU timestamp) and then adding
delta time computed using timecounter/cyclecounter support in kernel.
We have to configure cyclecounter with the GPU timestamp frequency.
Earlier approach that was based on cross-timestamp is not needed. It
was being used to approximate the frequency based on invalid assumptions
(possibly drift was being seen in the time due to precision issue).
The precision of time from GPU clocks is already in ns and timecounter
takes care of it as verified over variable durations.

Hi Sagar,

I have some doubts about this analysis...

The intent behind Sourab's original approach was to be able to determine the frequency at runtime empirically because the constants we have aren't particularly accurate. Without a perfectly stable frequency that's known very precisely then an interpolated correlation will inevitably drift. I think the nature of HW implies we can't expect to have either of those. Then the general idea had been to try and use existing kernel infrastructure for a problem which isn't unique to GPU clocks.
Hi Robert,

Testing on SKL shows timestamps drift only about 10us for sampling done in kernel for about 30min time.
Verified with changes from https://github.com/sakamble/i915-timestamp-support/commits/drm-tip
Note that since we are sampling counter in debugfs, there is likely overhead of read that is adding to drift so adjustment might be needed.
But with OA reports we just have to worry about initial timecounter setup where we need accurate pair of system time and GPU timestamp clock counts.
I think timestamp clock is highly stable and we don't need logic to determine frequency at runtime. Will try to get confirmation from HW team as well.

If we need to determine the frequency, Sourab's approach needs to refined as
1. It can be implemented entirely in i915 because what we need is pair of system time and gpu clocks over different durations.
2. crosstimestamp framework usage in that approach is incorrect as ideally we should be sending ART counter and GPU counter. Instead we were
hacking to send the TSC clock.
Quoting Thomas from  https://patchwork.freedesktop.org/patch/144298/
get_device_system_crosststamp() is for timestamps taken via a clock which is directly correlated with the timekeeper clocksource.
ART and TSC are correlated via: TSC = (ART * scale) + offset
get_device_system_crosststamp() invokes the device function which reads ART, which is converted to CLOCK_MONOTONIC_RAW by the conversion above,
and then uses interpolation to map the CLOCK_MONOTONIC_RAW value to CLOCK_MONOTONIC.
The device function does not know anything about TSC. All it knows about is ART.
I am not aware if GPU timestamp clock is correlated with TSC like ART for ethernet drivers and if i915 can read ART like ethernet drivers.
3. I have seen precision issues in the calculations in i915_perf_clock_sync_work and usage of MONO_RAW which might jump time.

That's not to say that a more limited, simpler solution based on frequent re-correlation wouldn't be more than welcome if tracking an accurate frequency is too awkward for now
Adjusting timecounter time can be another option if we confirm that GPU timestamp frequency is stable.
, but I think some things need to be considered in that case:

- It would be good to quantify the kind of drift seen in practice to know how frequently it's necessary to re-synchronize. It sounds like you've done this ("as verified over variable durations") so I'm curious what kind of drift you saw. I'd imagine you would see a significant drift over, say, one second and it might not take much longer for the drift to even become clearly visible to the user when plotted in a UI. For reference I once updated the arb_timer_query test in piglit to give some insight into this drift (https://lists.freedesktop.org/archives/piglit/2016-September/020673.html) and at least from what I wrote back then it looks like I was seeing a drift of a few milliseconds per second on SKL. I vaguely recall it being much worse given the frequency constants we had for Haswell.

On SKL I have seen very small drift of less than 10us over a period of 30 minutes.
Verified with changes from https://github.com/sakamble/i915-timestamp-support/commits/drm-tip

36bit counter will overflow in about 95min at 12mhz and timecounter framework considers
counter value with delta from timecounter init of more than half of total time covered by counter as time in the past so current approach works for less than 45min.
Will need to add overflow watchdog support like other drivers which just reinitializes timecounter prior to 45min.

- What guarantees will be promised about monotonicity of correlated system timestamps? Will it be guaranteed that sequential reports must have monotonically increasing timestamps? That might be fiddly if the gpu + system clock are periodically re-correlated, so it might be good to be clear in documentation that the correlation is best-effort only for the sake of implementation simplicity. That would still be good for a lot of UIs I think and there's freedom for the driver to start simple and potentially improve later by measuring the gpu clock frequency empirically.

If we rely on timecounter alone without correlation to know frequency, setting init time as MONOTONIC system time should take care of monotonicity of correlated times.

Regards,
Sagar
Currently only one correlated pair of timestamps is read when enabling the stream and so a relatively long time is likely to pass before the stream is disabled (seconds, minutes while a user is running a system profiler) . It seems very likely to me that these clocks are going to drift significantly without introducing some form of periodic re-synchronization based on some understanding of the drift that's seen.
 
Br,
- Robert



This series adds base timecounter/cyclecounter changes and changes to
get GPU and CPU timestamps in OA samples.

Sagar Arun Kamble (1):
  drm/i915/perf: Add support to correlate GPU timestamp with system time

Sourab Gupta (3):
  drm/i915/perf: Add support for collecting 64 bit timestamps with OA
    reports
  drm/i915/perf: Extract raw GPU timestamps from OA reports
  drm/i915/perf: Send system clock monotonic time in perf samples

 drivers/gpu/drm/i915/i915_drv.h  |  11 ++++
 drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h  |   6 ++
 include/uapi/drm/i915_drm.h      |  14 +++++
 4 files changed, 154 insertions(+), 1 deletion(-)

--
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux