Re: [PATCH] i915/pmu: Wire GuC backend to per-client busyness

Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx> · Tue, 14 Jun 2022 09:32:57 -0700

On Tue, Jun 14, 2022 at 02:30:42PM +0100, Tvrtko Ursulin wrote:

On 14/06/2022 01:46, Nerlige Ramappa, Umesh wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

GuC provides engine_id and last_switch_in ticks for an active context in the
pphwsp. The context image provides a 32 bit total ticks which is the accumulated
by the context (a.k.a. context[CTX_TIMESTAMP]). This information is used to
calculate the context busyness as follows:

If the engine_id is valid, then busyness is the sum of accumulated total ticks
and active ticks. Active ticks is calculated with current gt time as reference.

If engine_id is invalid, busyness is equal to accumulated total ticks.

Since KMD (CPU) retrieves busyness data from 2 sources - GPU and GuC, a
potential race was highlighted in an earlier review that can lead to double
accounting of busyness. While the solution to this is a wip, busyness is still
usable for platforms running GuC submission.

Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 11 +++-
 drivers/gpu/drm/i915/gt/intel_context.h       |  6 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drm_client.c        |  6 +-
 6 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 4070cb5711d8..a49f313db911 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -576,16 +576,23 @@ void intel_context_bind_parent_child(struct intel_context *parent,
 	child->parallel.parent = parent;
 }
-u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
+u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
 {
 	u64 total, active;
+	if (ce->ops->update_stats)
+		ce->ops->update_stats(ce);
+
 	total = ce->stats.runtime.total;
 	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
 		total *= ce->engine->gt->clock_period_ns;
 	active = READ_ONCE(ce->stats.active);
-	if (active)
+	/*
+	 * GuC backend returns the actual time the context was active, so skip
+	 * the calculation here for GuC.
+	 */
+	if (active && !intel_engine_uses_guc(ce->engine))

What is the point of looking at ce->stats.active in GuC mode? I see 
that guc_context_update_stats/__guc_context_update_clks touches it, 
but I can't spot that there is a purpose to it. This is the only 
conditional reading it but it is short-circuited in GuC case.

Also, since a GuC only vfunc (update_stats) has been added, I wonder 
why not just fork the whole runtime query (ce->get_total_runtime_ns). 
I think that would end up cleaner.

 		active = intel_context_clock() - active;
 	return total + active;

In case of GuC the active is used directly here since the active updated 
in update_stats is equal to the active time of the context already. I 
will look into separate vfunc.

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index b7d3214d2cdd..5fc7c19ab29b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -56,7 +56,7 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
 	return !!ce->parallel.number_children;
 }

snip

+static void __guc_context_update_clks(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_gt *gt = ce->engine->gt;
+	u32 *pphwsp, last_switch, engine_id;
+	u64 start_gt_clk = 0, active = 0;

No need to init these two.

+	unsigned long flags;
+	ktime_t unused;
+
+	spin_lock_irqsave(&guc->timestamp.lock, flags);
+
+	pphwsp = ((void *)ce->lrc_reg_state) - LRC_STATE_OFFSET;
+	last_switch = READ_ONCE(pphwsp[PPHWSP_GUC_CONTEXT_USAGE_STAMP_LO]);
+	engine_id = READ_ONCE(pphwsp[PPHWSP_GUC_CONTEXT_USAGE_ENGINE_ID]);
+
+	guc_update_pm_timestamp(guc, &unused);
+
+	if (engine_id != 0xffffffff && last_switch) {
+		start_gt_clk = READ_ONCE(ce->stats.runtime.start_gt_clk);
+		__extend_last_switch(guc, &start_gt_clk, last_switch);
+		active = intel_gt_clock_interval_to_ns(gt, guc->timestamp.gt_stamp - start_gt_clk);
+		WRITE_ONCE(ce->stats.runtime.start_gt_clk, start_gt_clk);
+		WRITE_ONCE(ce->stats.active, active);
+	} else {
+		lrc_update_runtime(ce);

Why is this called from here? Presumably it was called already from 
guc_context_unpin if here code things context is not active. Or will 
be called shortly, once context save is done.

guc_context_unpin is only called in the path of ce->sched_disable. The 
sched_disable is implemented in GuC (H2G message). Once the 
corresponding G2H response is received, the context is actually 
unpinned, eventually calling guc_context_unpin. Also the context may not 
necessarily be disabled after each context exit.


Also, some comments here describing the double accounting race would 
be good. Or if there are some already in the physical engine code just 
reference that function.

Will do,

Thanks,
Umesh

Regards,

Tvrtko