In a perfect world we would be able to read GPU registers of interest via the command stream with a 'read-register' command/package. For perf counters it is a must to read them synchronized with the GPU to put the values in relation to a draw command. As Vivante GPUs do not provide this functionality we need to emulate it in software. We need to support three different kind of perf register types: 1) normal register This is the easierst case where we can simply read the register and we are done. 2) debug register We need to configure the mux register and then read the debug register value. 3) pipeline register We need to 'iterate' over all pixel pipes and sum up the values. The 'iteration' is done by select the pipe of interest via HI_CLOCK_CONTROL_DEBUG_PIXEL_PIPE. There is also need to configure the mux register. Allowing the userspace to do it all by its own feels quite error prone and not future-proof. Thats why the kernel exports all performance domains and their signals to the userspace via two new ioctls. So the kernel knows all performance counters and how to sample them. struct drm_etnaviv_gem_submit was extended to include so-called performance monitor requests (pmrs). A request defines what domain and signal should be sampled (pre/post draw cmdbuffer) and where to store the result. The whole series can be found here: https://github.com/austriancoder/linux/tree/perfmon-v5 The used libdrm and mesa branches to test this feature can be found here: https://github.com/austriancoder/libdrm/commits/perfmon-v5 https://github.com/austriancoder/mesa/commits/perfmon-v5 GALLIUM_HUD=help will report following queries names: fps cpu cpu0 cpu1 cpu2 cpu3 prims-emitted draw-calls rs-operations hi-total-cyles hi-idle-cyles hi-axi-cycles-read-request-stalled hi-axi-cycles-write-request-stalled hi-axi-cycles-write-data-stalled pe-pixel-count-killed-by-color-pipe pe-pixel-count-killed-by-depth-pipe pe-pixel-count-drawn-by-color-pipe pe-pixel-count-drawn-by-depth-pipe sh-shader-cycles sh-ps-inst-counter sh-rendered-pixel-counter sh-vs-inst-counter sh-rendered-vertice-counter sh-vtx-branch-inst-counter sh-vtx-texld-inst-counter sh-plx-branch-inst-counter sh-plx-texld-inst-counter pa-input-vtx-counter pa-input-prim-counter pa-output-prim-counter pa-depth-clipped-counter pa-trivial-rejected-counter pa-culled-counter se-culled-triangle-count se-culled-lines-count ra-valid-pixel-count ra-total-quad-count ra-valid-quad-count-after-early-z ra-total-primitive-count ra-pipe-cache-miss-counter ra-prefetch-cache-miss-counter ra-pculled-quad-count tx-total-bilinear-requests tx-total-trilinear-requests tx-total-discarded-texutre-requests tx-total-texutre-requests tx-mem-read-count tx-mem-read-in-8b-count tx-cache-miss-count tx-cache-hit-texel-count tx-cache-miss-texel-count mc-total-read-req-8b-from-pipeline mc-total-read-req-8b-from-ip mc-total-write-req-8b-from-pipeline Changes v1 -> v2: - reworked events - reworked uapi - reworked enumeration of domains and signals - process sync point with a work item to keep irq as fast as possible - prevent GPU hang when reading pixel pipeline perf values - all SH perf counters are accessed via perf_reg_read(..) Changes v2 -> v3: - reworked alloc_event(..) - fixed pmr flag validation Changes v3 -> v4: - cherry picked the correct commits (patches 03 and 04) Changes v4 -> v5: - switch back to pixel pipe 0 to prevent GPU hang - only supported performance domains and signals get exported for a specifc pipe - reworked debug register handling - renamed pmrs_* to sync_point_* - call event_free(..) in sync_point_worker(..) Happy reviewing! Christian Gmeiner (25): drm/etnaviv: use bitmap to keep track of events drm/etnaviv: make it possible to allocate multiple events drm/etnaviv: add infrastructure to query perf counter drm/etnaviv: add uapi for perfmon feature drm/etnaviv: add internal representation of perfmon_request drm/etnaviv: extend etnaviv_gpu_cmdbuf_new(..) with nr_pmrs drm/etnaviv: add performance monitor request validation drm/etnaviv: copy pmrs from userspace drm/etnaviv: add performance monitor request processing drm/etnaviv: add 'sync point' support drm/etnaviv: clear alloced event drm/etnaviv: use 'sync points' for performance monitor requests drm/etnaviv: add HI perf domain drm/etnaviv: add PE perf domain drm/etnaviv: add SH perf domain drm/etnaviv: add PA perf domain drm/etnaviv: add SE perf domain drm/etnaviv: add RA perf domain drm/etnaviv: add TX perf domain drm/etnaviv: add MC perf domain drm/etnaviv: need to disable clock gating when doing profiling drm/etnaviv: move disabling of debug registers to the GPU init path drm/etnaviv: do not enable debug registers unconditionally drm/etnaviv: enable debug registers on demand drm/etnaviv: submit supports performance monitor requests drivers/gpu/drm/etnaviv/Makefile | 3 +- drivers/gpu/drm/etnaviv/etnaviv_buffer.c | 36 ++ drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.c | 15 +- drivers/gpu/drm/etnaviv/etnaviv_cmdbuf.h | 6 +- drivers/gpu/drm/etnaviv/etnaviv_drv.c | 39 ++- drivers/gpu/drm/etnaviv/etnaviv_drv.h | 1 + drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 69 +++- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 200 +++++++++-- drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 13 +- drivers/gpu/drm/etnaviv/etnaviv_perfmon.c | 495 +++++++++++++++++++++++++++ drivers/gpu/drm/etnaviv/etnaviv_perfmon.h | 49 +++ include/uapi/drm/etnaviv_drm.h | 43 ++- 12 files changed, 924 insertions(+), 45 deletions(-) create mode 100644 drivers/gpu/drm/etnaviv/etnaviv_perfmon.c create mode 100644 drivers/gpu/drm/etnaviv/etnaviv_perfmon.h -- 2.13.5 _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel