On 04/03/2021 11:54, Chris Wilson
wrote:
Actually if we want the best accuracy we can just deal with the lower dword.Accuracy of what? The lower dword read perhaps, or the accuracy of the sample point for the combined reads for the timestamp, which is closer to an external observer (cpu_clock() implies reference to an external observer). The two clock samples are not even necessarily closely related due to the nmi adjustments. If you wanted an unadjusted elapsed time for the read you can use local_clock() then return the chosen cpu_clock() before plus the elapsed delta from around the read as the estimated error. cpu_ts[1] = local_clock(); cpu_ts[0] = cpu_clock(); lower = intel_uncore_read_fw(uncore, lower_reg); cpu_ts[1] = local_clock() - cpu_ts[1]; -ChrisThanks, I meant the accuracy of having 2 samples GPU/CPU as close as possible. Avoiding to account another register read in there is nice. My testing was also mostly done with CLOCK_MONOTONIC_RAW which doesn't seem to be adjusted like CLOCK_MONOTONIC so maybe that why I didn't see the issue._RAW is still adjusted for skews, just not coupled into the ntp feedback. That is less obvious than the other clocks, and why it's preferred for comparing against other HW sources. But two reads of _RAW are only monotonic, not necessarily on the same time base. local_clock() is tsc/arat, so counting the CPU cycles between the two reads with the frequency (at least on x86) held constant (and arat should be frequency invariant). If we want much better accuracy, we are supposed to use cyclecounter_t and the system_device_crosststamp. -Chris
Thanks for the pointers.
I think people are mostly trying to map what's coming out of OA or queries from the various command streamers back to perf/ftrace.
As far I know perf will only let you select a clockid.
So maybe cyclecounter_t is not that useful atm.
-Lionel
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx