Re: [PATCH v16 08/17] drm/i915/perf: Add OA unit support for Gen 8+

Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> · Mon, 12 Jun 2017 18:33:42 +0100

On 12/06/17 18:23, Matthew Auld wrote:
On 5 June 2017 at 15:48, Lionel Landwerlin
<lionel.g.landwerlin@xxxxxxxxx> wrote:
From: Robert Bragg <robert@xxxxxxxxxxxxx>

Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
     lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
     context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
     (Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
     batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
     batchbuffer programming the OA unit doesn't actually work)
     (Lionel)
     Pin context before updating context image (Chris)
     Drop MMIO programming now that we switch to a kernel context with
     right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
      configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
      than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
      user contexts and reconfiguring the kernel context through the
      batchbuffer and updating all the other contexts' context image.
      Also take care to lock slice/subslice configuration when OA is
      on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
      (Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
      configuration changes in a later patch (Lionel)
      Remove usleep after programming the NOA configs on Gen8+, this
      doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

Signed-off-by: Robert Bragg <robert@xxxxxxxxxxxxx>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx>
Reviewed-by: Matthew Auld <matthew.auld@xxxxxxxxx> \o/
---
  drivers/gpu/drm/i915/i915_drv.h  |   46 +-
  drivers/gpu/drm/i915/i915_perf.c | 1032 +++++++++++++++++++++++++++++++++++---
  drivers/gpu/drm/i915/i915_reg.h  |   22 +
  drivers/gpu/drm/i915/intel_lrc.c |    2 +
  drivers/gpu/drm/i915/intel_lrc.h |    2 +
  include/uapi/drm/i915_drm.h      |   19 +-
  6 files changed, 1028 insertions(+), 95 deletions(-)
<SNIP>

+
+static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv)
+{
+       struct intel_engine_cs *engine = dev_priv->engine[RCS];
+       struct i915_gem_timeline *timeline;
+       struct drm_i915_gem_request *req;
+       int ret;
+
+       lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+       i915_gem_retire_requests(dev_priv);
+
+       req = i915_gem_request_alloc(engine, dev_priv->kernel_context);
+       if (IS_ERR(req))
+               return PTR_ERR(req);
+
+       ret = gen8_emit_oa_config(req);
+       if (ret)
+               return ret;
If we do request_alloc, aren't we always supposed to follow up with an
add_request, otherwise the kernel isn't tracking it?

Again you're right! Thanks for spotting this.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx