From: Sourab Gupta <sourab.gupta@xxxxxxxxx> Cc: Robert Bragg <robert@xxxxxxxxxxxxx>, Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>, Jon Bloomfield <jon.bloomfield@xxxxxxxxx>, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>, Jabin Wu <jabin.wu@xxxxxxxxx>, Insoo Woo <insoo.woo@xxxxxxxxx> This patch series builds upon the initial patch set floated earlier which extends the periodic OA sampling framework and adds handling asynchronous OA counter data and forwards the samples using perf. This series can be seen at: http://lists.freedesktop.org/archives/intel-gfx/2015-June/069263.html The OA unit, as such, is specific to render ring and can't cater to performance data requirements for other GPU engines. Specifically, the media workloads may utilize other GPU engines, but there is currently no framework which can be used to query performance statistics for non-RCS workloads and provide this data to userspace tools. This patch set tries to address this specific problem. The aim of this patch series is to build upon the perf event framework developed earlier and use it for forwarding performance data of non-RCS engine workloads. Since the previous PMU is customized to handle OA reports, a new perf PMU is added to handle generic non-OA performance data. An example of such non-OA performance data is the timestamps captured at asynchronous points during workload execution. This patch set makes this specific further by capturing the timestamps at batch buffer boundaries, by inserting commands for the same in ringbuffer, and forwarding the samples to userspace through perf interface. Nevertheless, the framework and data structures can be extended to introduce more performance data types (other than timestamps) and capture these at other points of workload execution. The intention here is to introduce a framework to enable capturing of generic performance data and forwarding the same to userspace using perf apis. The reports generated will again have an additional footer for metadata information such as ctx_id, pid, ring id and tags (in the same way as done for OA reports specified in the patch series earlier). This information can be used by userspace tools such as MVP (Modular Video Profiler) to associate reports with individual contexts and different stages of workload execution. In this patch set, the timestamps are captured at BB boundaries by inserting the commands in the ringbuffer at the batchbuffer boundaries. As specified earlier, for a system wide GPU profiler, the relative complexity of doing this in kernel is significantly less than supporting this usecase through userspace command insertion by all the different components. The final patch in the series tries to extend the data structures to enable capture of upto 8 MMIO register values, in conjunction with timestamps Sourab Gupta (7): drm/i915: Add a new PMU for handling non-OA counter data profiling requests drm/i915: Register routines for Gen perf PMU driver drm/i915: Introduce timestamp node for timestamp data collection drm/i915: Add mechanism for forwarding the data samples to userspace through Gen PMU perf interface drm/i915: Wait for GPU to finish before event stop in Gen Perf PMU drm/i915: Add routines for inserting commands in the ringbuf for capturing timestamps drm/i915: Add support for retrieving MMIO register values in Gen Perf PMU drivers/gpu/drm/i915/i915_dma.c | 2 + drivers/gpu/drm/i915/i915_drv.h | 47 +++ drivers/gpu/drm/i915/i915_oa_perf.c | 579 ++++++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 2 + include/uapi/drm/i915_drm.h | 25 ++ 5 files changed, 655 insertions(+) -- 1.8.5.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx