Hi all, This is an update a series that was sent out a few months ago. The end goal here is to optimize some media workloads. Here is some information provided by Dmitry (cc) on why we want this : Video decoding/encoding tends to work with macroblocks, dividing up a frame into smaller elements. Dependencies exist between those macroblocks, meaning that they cannot be processed in a random order and also there is a maximum number of macroblock that can process at a given time (called wave front). As a result, some workloads (below a certain resolution) will not make use of all the GPU's execution units. On a SKLGT4 (3 slices), for a transcoding workload at a 720x480p, we were able to measure a low number of active EUs (~3%) with 3 slices enabled. As we reduce the number of slices used to 1, the percentage of active EUs obviously increases (~9%). The execution time of the workload also decreases as we decrease the number of slices used (we measure an up to ~20% improvement with 1 slice). It's not clear what speeds up the workload. We currently think that the power budget is redistributed to other parts (including the CPU) and that the GPU thread scheduling is also sped up because it doesn't involve as many slices. We haven't found a way to measure these assumptions. Changing the powergating configuration doesn't come free though. We have some numbers in an IGT benchmark on how much delay is added each time we switch between 2 contexts of different powergating configurations. Measurements are in the order of ~50us on SKLGT4 (3 slices) and ~40us on KBLGT3 (2 slices). Cheers, Chris Wilson (3): drm/i915: Program RPCS for Broadwell drm/i915: Record the sseu configuration per-context & engine drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin (5): drm/i915: expose helper mapping exec flag engine to intel_engine_cs drm/i915: don't specify pinned size for wa_bb pin/allocation drm/i915: extract per-ctx/indirect bb programming drm/i915: pass wa_ctx as argument drm/i915: reprogram NOA muxes on context switch when using perf drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_gem_context.c | 104 +++++++++- drivers/gpu/drm/i915/i915_gem_context.h | 10 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 18 +- drivers/gpu/drm/i915/i915_perf.c | 92 +++++++- drivers/gpu/drm/i915/intel_lrc.c | 231 ++++++++++++++++----- drivers/gpu/drm/i915/intel_lrc.h | 5 + include/uapi/drm/i915_drm.h | 28 +++ 8 files changed, 419 insertions(+), 74 deletions(-) -- 2.17.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx