Re: [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/03/2021 02:09, Chris Wilson wrote:
Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00)
Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
   in perf_event_open. This clock id is used by the perf subsystem to
   return the appropriate cpu timestamp in perf events. Similarly, let
   the user pass the clockid to this query so that cpu timestamp
   corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
   register read

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_query.c | 144 ++++++++++++++++++++++++++++++
  include/uapi/drm/i915_drm.h       |  47 ++++++++++
  2 files changed, 191 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..acca22ee6014 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@
#include <linux/nospec.h> +#include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_user.h"
  #include "i915_drv.h"
  #include "i915_perf.h"
  #include "i915_query.h"
@@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private *dev_priv,
         return total_length;
  }
+typedef u64 (*__ktime_func_t)(void);
+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+       /*
+        * Use logic same as the perf subsystem to allow user to select the
+        * reference clock id to be used for timestamps.
+        */
+       switch (clk_id) {
+       case CLOCK_MONOTONIC:
+               return &ktime_get_ns;
+       case CLOCK_MONOTONIC_RAW:
+               return &ktime_get_raw_ns;
+       case CLOCK_REALTIME:
+               return &ktime_get_real_ns;
+       case CLOCK_BOOTTIME:
+               return &ktime_get_boottime_ns;
+       case CLOCK_TAI:
+               return &ktime_get_clocktai_ns;
+       default:
+               return NULL;
+       }
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+                 i915_reg_t lower_reg,
+                 i915_reg_t upper_reg,
+                 u64 *cs_ts,
+                 u64 *cpu_ts,
+                 __ktime_func_t cpu_clock)
+{
+       u32 upper, lower, old_upper, loop = 0;
+
+       upper = intel_uncore_read_fw(uncore, upper_reg);
+       do {
+               cpu_ts[0] = cpu_clock();
+               lower = intel_uncore_read_fw(uncore, lower_reg);
+               cpu_ts[1] = cpu_clock();
+               old_upper = upper;
+               upper = intel_uncore_read_fw(uncore, upper_reg);
Both register reads comprise the timestamp returned to userspace, so
presumably you want cpu_ts[] to wrap both.

        do {
                old_upper = upper;

                cpu_ts[0] = cpu_clock();
                lower = intel_uncore_read_fw(uncore, lower_reg);
                upper = intel_uncore_read_fw(uncore, upper_reg);
                cpu_ts[1] = cpu_clock();
        } while (upper != old_upper && loop++ < 2);

Actually if we want the best accuracy we can just deal with the lower dword.

We can check the upper one hasn't changed outside of the 2 cpu_clock() calls.


-Lionel


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux