From: Sourab Gupta <sourab.gupta@xxxxxxxxx> This series adds framework for collection of gpu performance metrics associated with the command stream of a particular engine. These metrics include OA reports, timestamps, mmio metrics, etc. These metrics are are collected around batchbuffer boundaries. This work utilizes the underlying infrastructure introduced in Robert Bragg's patches for collecting periodic OA counter snapshots (based on Haswell): https://lists.freedesktop.org/archives/intel-gfx/2016-February/086909.html This patch set is based on Gen8+ version of Robert's patch series, which can be found here: https://github.com/rib/linux/tree/wip/rib/oa-next These are not yet individually floated in the mailing list, which I hope doesn't lead to any significant loss of clarity in order to review the work proposed in this patch series. Compared to last series sent earlier, this series is based on drm i915 ioctl based implementation ( which can be referred to, in Robert's work). As such, the design has been changed (and simplified) due to some earlier core perf assumptions going away. Few salient features are listed below: * Ability to collect command stream based OA reports on render engine, in conjunction with the periodic reports generated with the OA unit. These would be collected in seperate buffers and forwarded to userspace in the respective timestamp order. The samples are differentiated in userspace by distinguishing the value of OA sample source field. * Ability to collect timestamps and mmio metrics, associated with command stream of any particular gpu engine. The particular sample metrics to be collected are requested by userspace client in the properties associated with the stream being opened. The samples generated depend on original sample flags requested in the stream properties. * Ability to collect associated metadata information with the samples such as pid, tags, etc. These are collected at the time of inserting the commands into the command stream of particular gpu engine, and forwarded along with samples * Multiple streams belonging to different engines can be opened concurrently (while restricting only one instance of open stream per engine). This allows us to open simultaneously streams belonging to different gpu engines to collect samples belonging to all of them concurrently. * The different stages of a single workload (belonging to a single context) can be delimited by using 'execbuffer tagging' mechanism introduced here. For e.g. for the media pipeline, CodecHAL encoding stage has a single context and involves multiple stages such as Scaling, ME, MBEnc, PAK for which there are separate execbuffer calls. There is a need to have the samples generated to have such information, so as to be able to associate them with the particular workload stage. The presence of a tag sample_type, which is passed in by userspace during execbuffer ioctl fulfills this requirement. I am looking for feedback on the design proposed here, particularly pertaining to the mechanics of metrics collection through insertion of commands in the command stream of associated gpu engines, sample generation according to the requested sample flags in stream properties, concurrent operation of different streams to collect the samples from multiple gpu engines, and any such design/implementation aspects per se. Few open issues which I'm working on include: * In case both timestamp and OA sample type are requested for render engine, the ts information should be able to be derived from OA report only, and we should not need to insert seperate commands for dumping timestamps. Though, we need to apply relevant timestamp base conversion for converting from OA timestamps into ns. * The sample consistency has to be maintained between the periodic OA reports and the ones generated by command stream. This implies, for e.g., that if pid sample_type is requested, the most recent pid collected in the CS samples should be used to populate the relevant field in the periodic samples. Likewise, the field 'ctx_id' needs to be deduced from the periodic OA reports and mapped to 'intel_context::global_id', for periodic OA reports. These open issues, though, shouldn't be distracting us too much from reviewing the general mechanism proposed here, and these can be ironed out subsequently, if there's a general agreement on the design here. Also, one of the pre-requisite for this work is presence of globally unique id associated with each context. The present context id is specific to drm fd, and as such, it can't uniquely be used to associate the reports generated with the corresponding context scheduled from userspace in a global way. The first few patches in the series introduce the globally unique context id, and subsequent ones introduce the framework for collection of metrics. Robert Bragg (2): drm/i915: Constrain intel_context::global_id to 20 bits drm/i915: return ctx->global_id from intel_execlists_ctx_id() Sourab Gupta (9): drm/i915: Introduce global id for contexts drm/i915: Add ctx getparam ioctl parameter to retrieve ctx global id drm/i915: Expose OA sample source to userspace drm/i915: Framework for capturing command stream based OA reports drm/i915: Add support for having pid output with OA report drm/i915: Add support to add execbuffer tags to OA counter reports drm/i915: Extend i915 perf framework for collecting timestamps on all gpu engines drm/i915: Support opening multiple concurrent perf streams drm/i915: Support for capturing MMIO register values drivers/gpu/drm/i915/i915_debugfs.c | 7 +- drivers/gpu/drm/i915/i915_drv.h | 68 +- drivers/gpu/drm/i915/i915_gem_context.c | 23 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 + drivers/gpu/drm/i915/i915_perf.c | 1300 +++++++++++++++++++++++++--- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/intel_lrc.c | 26 +- drivers/gpu/drm/i915/intel_lrc.h | 2 +- include/uapi/drm/i915_drm.h | 72 ++ 9 files changed, 1349 insertions(+), 156 deletions(-) -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx