On Tue, Jan 21, 2025 at 03:58:03PM -0800, Umesh Nerlige Ramappa wrote: > On Tue, Jan 21, 2025 at 06:10:34PM -0500, Rodrigo Vivi wrote: > > On Tue, Jan 21, 2025 at 02:25:57PM -0800, Umesh Nerlige Ramappa wrote: > > > > drm/i915/pmu as tag please... > > will do > > > > > > When running igt@gem_exec_balancer@individual for multiple iterations, > > > it is seen that the delta busyness returned by PMU is 0. The issue stems > > > from a combination of 2 implementation specific details: > > > > > > 1) gt_park is throttling __update_guc_busyness_stats() so that it does > > > not hog PCI bandwidth for some use cases. (Ref: 59bcdb564b3ba) > > > > > > 2) busyness implementation always returns monotonically increasing > > > counters. (Ref: cf907f6d29421) > > > > > > If an application queried an engine while it was active, > > > engine->stats.guc.running is set to true. Following that, if all PM > > > wakeref's are released, then gt is parked. At this time the throttling > > > of __update_guc_busyness_stats() may result in a missed update to the > > > running state of the engine (due to (1) above). This means subsequent > > > calls to guc_engine_busyness() will think that the engine is still > > > running and they will keep updating the cached counter (stats->total). > > > This results in an inflated cached counter. > > > > > > Later when the application runs a workload and queries for busyness, we > > > return the cached value since it is larger than the actual value (due to > > > (2) above) > > > > > > All subsequent queries will return the same large (inflated) value, so > > > the application sees a delta busyness of zero. > > > > > > In order to fix the issue, > > > - reset the running state of engines in busyness_park outside of the > > > throttling code. > > > - when busyness is queried and PM wakeref is not active, do not > > > calculate active engine time since we do not expect engines to be > > > active without a wakeref. > > > > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13366 > > > Fixes: cf907f6d2942 ("i915/guc: Ensure busyness counter increases motonically") > > > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx> > > > --- > > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 18 +++++++++++++++++- > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > index 12f1ba7ca9c1..b7a831e1fc85 100644 > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > @@ -1372,7 +1372,7 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now) > > > } > > > > > > total = intel_gt_clock_interval_to_ns(gt, stats->total_gt_clks); > > > - if (stats->running) { > > > + if (wakeref && stats->running) { > > > > do you really need to check this wakeref if you are now setting running to > > false earlier? > > Maybe not. I did try it without this wakeref and that passes too. > > > > > And if we do, what about moving this entire block to inside the > > existing if (wakeref) ?! > > Yes, looks like I could move it inside the existing wakeref check block, but > I think I will drop this check since running is being set to false in the > gt_park code path. > > > > > > u64 clk = guc->timestamp.gt_stamp - stats->start_gt_clk; > > > > > > total += intel_gt_clock_interval_to_ns(gt, clk); > > > @@ -1469,6 +1469,19 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc) > > > spin_unlock_irqrestore(&guc->timestamp.lock, flags); > > > } > > > > > > +static void __update_guc_busyness_running_state(struct intel_guc *guc) > > > +{ > > > + struct intel_gt *gt = guc_to_gt(guc); > > > + struct intel_engine_cs *engine; > > > + enum intel_engine_id id; > > > + unsigned long flags; > > > + > > > + spin_lock_irqsave(&guc->timestamp.lock, flags); > > > + for_each_engine(engine, gt, id) > > > + engine->stats.guc.running = false; > > > + spin_unlock_irqrestore(&guc->timestamp.lock, flags); > > > +} > > > + > > > static void __update_guc_busyness_stats(struct intel_guc *guc) > > > { > > > struct intel_gt *gt = guc_to_gt(guc); > > > @@ -1619,6 +1632,9 @@ void intel_guc_busyness_park(struct intel_gt *gt) > > > if (!guc_submission_initialized(guc)) > > > return; > > > > > > + /* Assume no engines are running and set running state to false */ > > > + __update_guc_busyness_running_state(guc); > > > + > > > > Why not to move the entire __reset_guc_busyness_stats earlier then? > > Not sure what you mean by __reset_guc_busyness_stats because that is only > called when reset is in progress. > > Do you mean __update_guc_busyness_stats()? > > Only the running state needs to be updated for every gt_park. Updates to > other stats can be throttled based on the jiffies code in > intel_guc_busyness_park(). ah okay then, please ignore me in this last point... > > Thanks, > Umesh > > > > > > /* > > > * There is a race with suspend flow where the worker runs after suspend > > > * and causes an unclaimed register access warning. Cancel the worker > > > -- > > > 2.34.1 > > >