On 5 June 2017 at 15:48, Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> wrote: > From: Robert Bragg <robert@xxxxxxxxxxxxx> > > Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all > share (more-or-less) the same OA unit design. > > Of particular note in comparison to Haswell: some OA unit HW config > state has become per-context state and as a consequence it is somewhat > more complicated to manage synchronous state changes from the cpu while > there's no guarantee of what context (if any) is currently actively > running on the gpu. > > The periodic sampling frequency which can be particularly useful for > system-wide analysis (as opposed to command stream synchronised > MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to > have become per-context save and restored (while the OABUFFER > destination is still a shared, system-wide resource). > > This support for gen8+ takes care to consider a number of timing > challenges involved in synchronously updating per-context state > primarily by programming all config state from the cpu and updating all > current and saved contexts synchronously while the OA unit is still > disabled. > > The driver intentionally avoids depending on command streamer > programming to update OA state considering the lack of synchronization > between the automatic loading of OACTXCONTROL state (that includes the > periodic sampling state and enable state) on context restore and the > parsing of any general purpose BB the driver can control. I.e. this > implementation is careful to avoid the possibility of a context restore > temporarily enabling any out-of-date periodic sampling state. In > addition to the risk of transiently-out-of-date state being loaded > automatically; there are also internal HW latencies involved in the > loading of MUX configurations which would be difficult to account for > from the command streamer (and we only want to enable the unit when once > the MUX configuration is complete). > > Since the Gen8+ OA unit design no longer supports clock gating the unit > off for a single given context (which effectively stopped any progress > of counters while any other context was running) and instead supports > tagging OA reports with a context ID for filtering on the CPU, it means > we can no longer hide the system-wide progress of counters from a > non-privileged application only interested in metrics for its own > context. Although we could theoretically try and subtract the progress > of other contexts before forwarding reports via read() we aren't in a > position to filter reports captured via MI_REPORT_PERF_COUNT commands. > As a result, for Gen8+, we always require the > dev.i915.perf_stream_paranoid to be unset for any access to OA metrics > if not root. > > v5: Drain submitted requests when enabling metric set to ensure no > lite-restore erases the context image we just updated (Lionel) > > v6: In addition to drain, switch to kernel context & update all > context in place (Chris) > > v7: Add missing mutex_unlock() if switching to kernel context fails > (Matthew) > > v8: Simplify OA period/flex-eu-counters programming by using the > batchbuffer instead of modifying ctx-image (Lionel) > > v9: Back to updating the context image (due to erroneous testing, > batchbuffer programming the OA unit doesn't actually work) > (Lionel) > Pin context before updating context image (Chris) > Drop MMIO programming now that we switch to a kernel context with > right values in initial context image (Chris) > > v10: Just pin_map the contexts we want to modify or let the > configuration happen on first use (Chris) > > v11: Update kernel context OA config through the batchbuffer rather > than on the fly ctx-image update (Lionel) > > v12: Rework OA context registers update again by swithing away from > user contexts and reconfiguring the kernel context through the > batchbuffer and updating all the other contexts' context image. > Also take care to lock slice/subslice configuration when OA is > on. (Lionel) > > v13: Request rpcs updates on all engine when updating the OA config > (Lionel) > > v14: Drop any kind of rpcs management now that we monitor sseu > configuration changes in a later patch (Lionel) > Remove usleep after programming the NOA configs on Gen8+, this > doesn't seem to be needed (Lionel) > > v15: Respect coding style for block comments (Chris) > > Signed-off-by: Robert Bragg <robert@xxxxxxxxxxxxx> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> > Reviewed-by: Matthew Auld <matthew.auld@xxxxxxxxx> \o/ > --- > drivers/gpu/drm/i915/i915_drv.h | 46 +- > drivers/gpu/drm/i915/i915_perf.c | 1032 +++++++++++++++++++++++++++++++++++--- > drivers/gpu/drm/i915/i915_reg.h | 22 + > drivers/gpu/drm/i915/intel_lrc.c | 2 + > drivers/gpu/drm/i915/intel_lrc.h | 2 + > include/uapi/drm/i915_drm.h | 19 +- > 6 files changed, 1028 insertions(+), 95 deletions(-) <SNIP> > + > +static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv) > +{ > + struct intel_engine_cs *engine = dev_priv->engine[RCS]; > + struct i915_gem_timeline *timeline; > + struct drm_i915_gem_request *req; > + int ret; > + > + lockdep_assert_held(&dev_priv->drm.struct_mutex); > + > + i915_gem_retire_requests(dev_priv); > + > + req = i915_gem_request_alloc(engine, dev_priv->kernel_context); > + if (IS_ERR(req)) > + return PTR_ERR(req); > + > + ret = gen8_emit_oa_config(req); > + if (ret) > + return ret; If we do request_alloc, aren't we always supposed to follow up with an add_request, otherwise the kernel isn't tracking it? _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx