On 15/06/2022 19:59, Lucas De Marchi wrote:
On Tue, Jun 14, 2022 at 08:07:04AM +0100, Tvrtko Ursulin wrote:
On 14/06/2022 02:10, Umesh Nerlige Ramappa wrote:
On Sat, Jun 11, 2022 at 10:27:11AM -0700, Alan Previn wrote:
Using igt's gem-create and with additional patches to track object
creation time, it was measured that guc_update_engine_gt_clks was
getting called over 188 thousand times in the span of 15 seconds
(running the test three times).
Get a jiffies sample on every trigger and ensure we skip sampling
if we are being called too soon. Use half of the ping_delay as a
safe threshold.
NOTE: with this change, the number of calls went down to just 14
over the same span of time (matching the original intent of running
about once every 24 seconds, at 19.2Mhz GT freq, per engine).
Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/intel_engine_types.h | 10 ++++++++++
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 9 +++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 2286f96f5f87..63f4ecdf1606 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -323,6 +323,16 @@ struct intel_engine_guc_stats {
* @start_gt_clk: GT clock time of last idle to active transition.
*/
u64 start_gt_clk;
+
+ /**
+ * @last_jiffies: Jiffies at last actual stats collection time
+ *
+ * We use this timestamp to ensure we don't oversample the
+ * stats because runtime power management events can trigger
+ * stats collection at much higher rates than required.
+ */
+ u64 last_jiffies;
+
};
struct intel_engine_cs {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 5a1dfacf24ea..8f8bf6e40ccb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1167,6 +1167,15 @@ static void guc_update_engine_gt_clks(struct
intel_engine_cs *engine)
A user query will end up in guc_engine_busyness which will call
guc_update_engine_gt_clks. Adding this logic here will affect accuracy.
The other place where guc_update_engine_gt_clks is called is in the
ping worker, but that worker runs at 1/8th the wrap around time for
the gt clocks (32 bit). The last I checked the wrap around was at 22
seconds.
That leaves only the gt_park path. fwiu, this path runs too
frequently and here we are updating the busyness stats. That is
causing the enormous PCI traffic (lmem accesses). Only this path
needs to be fixed, as in just use the same logic in the
intel_guc_busyness_park() to decide whether to call
__update_guc_busyness_stats or not.
Not updating the driver state in park will not negatively impact
accuracy in some scenarios? That needs to balanced against the
questions whether or not there are real world scenarios impacted by
the update cost or it is just for IGT.
there is, which was what motivated
https://patchwork.freedesktop.org/series/105011/ and in parallel Alan
worked on this. I view both as orthogonal thought. I used it to make
the single-word-from-lmem faster, but if we can reduce
the frequency this code path is called, it should be even better.
Per Umesh's and your comment I'm unsure if we can... but if
there is no user monitoring the usage, should we still be calling this?
"Nobody is looking, why are we sampling?" kind of thought.
Who did you find is doing the sampling in the real world use case? AFAIR
if one one is querying busyness, I thought there would only be the GuC
ping worker which runs extremely infrequently (to avoid some counter
overflow).
Regards,
Tvrtko
Summarizing the first patch in my series: it improved igt in ~50% and a
real world case in ~12%
Lucas De Marchi