Re: [RFC 4/6] drm/i915/pmu: Add queued counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 22/01/2018 18:56, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-01-22 18:43:56)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

We add a PMU counter to expose the number of requests which have been
submitted from userspace but are not yet runnable due dependencies and
unsignaled fences.

This is useful to analyze the overall load of the system.

v2:
  * Rebase for name change and re-order.
  * Drop floating point constant. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_pmu.c         | 40 +++++++++++++++++++++++++++++----
  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
  include/uapi/drm/i915_drm.h             |  9 +++++++-
  3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index cbfca4a255ab..8eefdf09a30a 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -36,7 +36,8 @@
  #define ENGINE_SAMPLE_MASK \
         (BIT(I915_SAMPLE_BUSY) | \
          BIT(I915_SAMPLE_WAIT) | \
-        BIT(I915_SAMPLE_SEMA))
+        BIT(I915_SAMPLE_SEMA) | \
+        BIT(I915_SAMPLE_QUEUED))
#define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) @@ -220,6 +221,11 @@ static void engines_sample(struct drm_i915_private *dev_priv) update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
                               PERIOD, !!(val & RING_WAIT_SEMAPHORE));
+
+               if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED))
+                       update_sample(&engine->pmu.sample[I915_SAMPLE_QUEUED],
+                                     I915_SAMPLE_QUEUED_DIVISOR,
+                                     atomic_read(&engine->request_stats.queued));

engine->request_stats.foo works for me, and reads quite nicely.

+/* No brackets or quotes below please. */
+#define I915_SAMPLE_QUEUED_SCALE 0.01

+ /* Divide counter value by divisor to get the real value. */
+#define I915_SAMPLE_QUEUED_DIVISOR (100)

I'm just thinking of favouring the sampler arithmetic by using 128. As
far as userspace the difference is not going to that noticeable, less if
you chose 256.

I'll do 1024 then, but the CPU usage in the sampling thread is so low anyway.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux