From: Sourab Gupta <sourab.gupta@xxxxxxxxx> This is an updated patch series(changes list at end), which adds support for capturing OA counter snapshots for multiple contexts, by inserting MI_REPORT_PERF_COUNT commands into CS, and forwarding these snapshots to userspace using perf interface. This work is based on Robert Bragg's perf event framework, the patches for which were floated earlier: http://lists.freedesktop.org/archives/intel-gfx/2015-May/066102.html Robert's perf event framework enabled capture of periodic OA counter snapshots by configuring the OA unit during perf event init. The raw OA reports generated by HW are then forwarded to userspace using perf apis. But, there may be usecases wherein we need more than the periodic OA capture functionality which is supported by perf_event currently. Few such usecases are: - Ability to capture system wide metrics. The reports captured should be able to be mapped back to individual contexts. - Ability to inject tags for work, into the reports. This provides visibility into the multiple stages of work within single context. This framework proposed here may also be seen as a way to overcome a limitation of Haswell, which doesn't write out a context ID with OA reports and handling this in the kernel makes sense when we plan for compatibility with Broadwell which doesn't include context id in reports. This can be achieved by inserting the commands into the ring, before and after the batchbuffer, to dump the OA counter snapshots. The reports generated can have an additional footer appended for capturing the metadata information such as ctx id, pid, tags, etc. The specific issue of counter wraparound due to large batchbuffers can be subverted by using them in conjunction with periodic OA snapshots. Such per-BB data can give useful information to userspace tools to analyze performance and timing information at batchbuffer level. An application intending to profile its own contexts can do so by submitting the MI_REPORT_PERF_COUNT commands into the CS from the userspace itself. But consider the usecase of a system wide GPU profiler tool which needs the data for all the workloads being scheduled on GPU globally. The relative complexity of doing this in kernel is significantly less than supporting such a usecase through userspace. This framework is intended to feed into the requirement of such system wide GPU profilers, which may further utilize this data for usecases such as performance analysis (at a global level), identifying optimization scenarios for improving GPU utilization, CPU vs GPU timing analysis, etc. Again, this is made possible by presence of metadata information with individual reports, which is enabled by this framework. One such system wide GPU profiler tool is MVP(Modular Video Profiler) tool, used by media team for profiling media workloads. The current implementation approach is to forward these samples through the same PERF_SAMPLE_RAW sample type, as being done for periodic samples, with an additional footer appended for metadata information. The userspace can then distinguish these samples by filtering out on the basis of sample size. One of the other approaches being contemplated right now is creating seperate sample types to handle these different kind of samples. There would be different fd's associated with these different sample types, though they can be a part of one event group. The userspace can listen to either or both these sample types while specifying event attributes during event init. But right now, I'm seeing this work as a future refinement, based on acceptance of general framework as such. I'm looking, as of now, to get the feedback on these initial patches, w.r.t. the usage of perf apis and the interaction with i915. Another feature introduced in these patches is execbuffer tagging. It is a mechanism, whereby the reports collected are marked with a tag passed by userspace during the execbuffer call. This way the userspace tool can associate the reports collected with the corresponding execbuffers. This satifies the requirement to have visibility into multiple stages (i.e. execbuffers) lying within a single context. For e.g. for the media pipeline, CodecHAL encoding stage has a single context, and involves multiple stages such as Scaling, ME, MBEnc, PAK for which there are seperate execbuffer calls. There is a need to have the granularity of these multiple stages of a context for the reports generated. The presence of a tag in report metadata fulfills this requirement. One of the pre-requisite for this work is presence of globally unique context id. The context id right now is specific to drm file instance. As such, it can't uniquely be used to associate the reports generated with the corresponding context scheduled from userspace in a global way. In absence of globally unique context id, other metadata such as pid/tags in conjunction with ctx id may be used to associate reports with their corresponding contexts. The first patch in the series proposes a way of implementing globally unique context id. I'm looking for comments on the pros & cons of having global ctx id. This implementation can be refined upon if this approach is acceptable. The subsequent patches introduce the multi-context OA capture mode and the mechanism to forward these snapshots using perf. This patch set currently supports Haswell. Gen8+ support can be added when the basic framework is agreed upon. v2: This patch series has the following changes wrt the one floated earlier: - Removing synchronous waits during event stop/destroy - segregating the book-keeping data for the samples from destination buffer and collecting it into a separate list - managing the lifetime of destination buffer with the help of gem active reference tracking - having the scope of i915 device mutex limited to places of gem interaction and having the pmu data structures protected with a per pmu lock - userspace can now control the metadata it wants by requesting the same during event init. The sample is sent with the requested metadata in a packed format. - Some patches merged together and a few more introduced Sourab Gupta (8): drm/i915: Have globally unique context ids, as opposed to drm file specific drm/i915: Introduce mode for capture of multi ctx OA reports synchronized with RCS drm/i915: Add mechanism for forwarding CS based OA counter snapshots through perf drm/i915: Forward periodic and CS based OA reports sorted acc to timestamps drm/i915: Handle event stop and destroy for commands in flight drm/i915: Insert commands for capture of OA counters in the ring drm/i915: Add support for having pid output with OA report drm/i915: Add support to add execbuffer tags to OA counter reports drivers/gpu/drm/i915/i915_debugfs.c | 4 +- drivers/gpu/drm/i915/i915_drv.h | 52 ++- drivers/gpu/drm/i915/i915_gem_context.c | 53 ++- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 + drivers/gpu/drm/i915/i915_oa_perf.c | 604 +++++++++++++++++++++++++---- include/uapi/drm/i915_drm.h | 33 +- 6 files changed, 666 insertions(+), 90 deletions(-) -- 1.8.5.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx