Re: [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy

Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> · Thu, 4 Mar 2021 12:27:36 +0200

    On 04/03/2021 11:54, Chris Wilson
      wrote:

            Actually if we want the best accuracy we can just deal with the lower dword.

          Accuracy of what? The lower dword read perhaps, or the accuracy of the
sample point for the combined reads for the timestamp, which is closer
to an external observer (cpu_clock() implies reference to an external
observer).

The two clock samples are not even necessarily closely related due to the
nmi adjustments. If you wanted an unadjusted elapsed time for the read
you can use local_clock() then return the chosen cpu_clock() before plus
the elapsed delta from around the read as the estimated error.

cpu_ts[1] = local_clock();
cpu_ts[0] = cpu_clock();
lower = intel_uncore_read_fw(uncore, lower_reg);
cpu_ts[1] = local_clock() - cpu_ts[1];
-Chris

        Thanks,

I meant the accuracy of having 2 samples GPU/CPU as close as possible.

Avoiding to account another register read in there is nice.

My testing was also mostly done with CLOCK_MONOTONIC_RAW which doesn't 
seem to be adjusted like CLOCK_MONOTONIC so maybe that why I didn't see 
the issue.

      _RAW is still adjusted for skews, just not coupled into the ntp feedback.
That is less obvious than the other clocks, and why it's preferred for
comparing against other HW sources. But two reads of _RAW are only
monotonic, not necessarily on the same time base. local_clock() is
tsc/arat, so counting the CPU cycles between the two reads with the
frequency (at least on x86) held constant (and arat should be frequency
invariant).

If we want much better accuracy, we are supposed to use cyclecounter_t
and the system_device_crosststamp.
-Chris

    Thanks for the pointers.
    I think people are mostly trying to map what's coming out of OA
      or queries from the various command streamers back to perf/ftrace.
    As far I know perf will only let you select a clockid.

    So maybe cyclecounter_t is not that useful atm.

    -Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx