Re: [PATCH 07/10] drm/i915: Gate engine stats collection with a static key

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 5 Oct 2017 08:07:03 +0100

On 04/10/2017 18:49, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2017-10-04 18:38:09)

On 03/10/2017 11:17, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2017-09-29 13:34:57)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

This reduces the cost of the software engine busyness tracking
to a single no-op instruction when there are no listeners.

We add a new i915 ordered workqueue to be used only for tasks
not needing struct mutex.

v2: Rebase and some comments.
v3: Rebase.
v4: Checkpatch fixes.
v5: Rebase.
v6: Use system_long_wq to avoid being blocked by struct_mutex
      users.
v7: Fix bad conflict resolution from last rebase. (Dmitry Rogozhkin)
v8: Rebase.
v9:
   * Fix race between unordered enable followed by disable.
     (Chris Wilson)
   * Prettify order of local variable declarations. (Chris Wilson)

Ok, I can't see a downside to enabling the optimisation even if it will
be global and not per-device/per-engine.

For this one I did a quick test with gem_exec_nop and I've seen around
0.5% reduction in time spend in intel_lrc_irq_handler in the case where
PMU is not active.

Hmm, gem_exec_nop isn't going to be favourable as there we are just
extending the busyness coverage of an engine. I think you want something
like gem_sync/sequential (or gem_exec_whisper), as there each engine
will be starting and stopping, and delays between engines will
accumulate.

Not sure if we are on the same page. Here I was referring to the CPU 
usage in the "irq" (tasklet) handler. gem_exec_nop generates a good 
number of interrupts (8k/s) and shows up in the profile at ~1.8% CPU 
without the static branch optimisation, and ~1.3% with it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx