Hi all, Tests run by the GPA/vTune teams reported a high CPU consumption on polling data from the i915-perf stream. This is due to a design decision to check the OA buffer head/tail pointers 200 times per seconds. The reasonning being not using the interrupt was initially that you could want to process data as soon as it's available. To avoid breaking the existing behavior of the i915-perf stream, we'll introduce 2 new options at opening of the i915-perf stream to allow the application to choose : - how often it wants the OA circular buffer registers to be checked - whether to make sure of the interrupt In the case of GPA, collecting 100k OA reports would initially consume 17~20% CPU. With these 2 new parameters set to 1 second for OA circular buffer checks and the interrupt enabled, CPU usage drops to 2~3%. I'm looking for feedback as to whether those 2 new opening parameters are alright. Since this might be the first time that we add a new parameter to the perf stream open ioctl, we would also need a way to detect their availability. So far in my experiments I've used the following trick : specify an invalid context id parameter at open and expect ENOENT when the parameter is available, EINVAL otherwise. I'm open to other ways of doing this. Cheers, Lionel Landwerlin (4): drm/i915/perf: rework aging tail workaround drm/i915/perf: add new open param to configure polling of OA buffer drm/i915/perf: handle interrupts from the OA unit drm/i915/perf: add interrupt enabling parameter drivers/gpu/drm/i915/i915_drv.h | 59 +++-- drivers/gpu/drm/i915/i915_irq.c | 39 +++- drivers/gpu/drm/i915/i915_perf.c | 273 +++++++++++++++--------- drivers/gpu/drm/i915/i915_reg.h | 7 + drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + include/uapi/drm/i915_drm.h | 15 ++ 6 files changed, 276 insertions(+), 119 deletions(-) -- 2.20.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx