Re: [PATCH 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

John Harrison <john.c.harrison@xxxxxxxxx> · Tue, 16 Aug 2022 17:06:53 -0700

On 8/16/2022 16:55, Teres Alexis, Alan Previn wrote:
On Tue, 2022-08-16 at 15:45 -0700, Harrison, John C wrote:
On 8/15/2022 19:17, Alan Previn wrote:
From: Matthew Brost <matthew.brost@xxxxxxxxx>

Add a delay, configurable via debugfs (default 34ms), to disable
(default is 3/4) of the guc_ids.
are in use.

my bad - got clipped - will fix.

--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,7 +65,13 @@
    * corresponding G2H returns indicating the scheduling disable operation has
    * completed it is safe to unpin the context. While a disable is in flight it
    * isn't safe to resubmit the context so a fence is used to stall all future
- * requests of that context until the G2H is returned.
+ * requests of that context until the G2H is returned. Because this interaction
+ * with the GuC takes a non-zero amount of time we delay the disabling of
+ * scheduling after the pin count goes to zero by a configurable period of time
+ * (see SCHED_DISABLE_DELAY_MS). The thought is this gives the user a window of
+ * time to resubmit something on the context before doing this costly operation.
+ * This delay is only done if the context isn't closed and the guc_id usage is
+ * less than a threshold (default 3/4, see NUM_SCHED_DISABLE_GUC_IDS_THRESHOLD)
The delay text has dropped the 'default 34ms' in preference for just
referencing the define but the threshold still mentions both. Is that
intentional? Dropping the values seems sensible - no need to update the
comment if the numeric value is changed at some later point.

Yes intentional - and yes, i should be consistent for both. will fix.

    *
@@ -4207,6 +4288,20 @@ static bool __guc_submission_selected(struct intel_guc *guc)
+int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc)
+{
+	return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
+}
+
+#define SCHED_DISABLE_DELAY_MS	34
+/*
+ * This default value of 33 milisecs (+1 milisec buffer) ensures 33fps workloads are
+ * able to enjoy the latency reduction when delaying the schedule-disable operation
+ */
+#define NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(__guc) \
+	(((intel_guc_sched_disable_gucid_threshold_max(guc)) * 3) / 4)
+/* Default threshold to bypass delay-schedule-disable when under guc-id pressure */
+
Comments always go in front of the thing they are describing, not after.
Will fix.

Also, the description reads as just a more verbose version of the
#define. As in, more words but no extra information. Not sure what more
can be said about the threshold. I'm not aware of any empirical or
theoretical evidence as to why 75% is a good number beyond 'it just
seems sensible'.
Yes - this was merely a sensible choice that wasnt part of a known performance issue with real world workloads
(considering we have thousands of guc-ids at our disposal). Will fix (remove the comment since its self-descriptive
anyways).
It should have some comment. Even if it is just "75% seems like a 
reasonable starting point, real world apps generally don't get anywhere 
near this".


The 'make it work for 33fps' seems quite arbitrary and
magical, though. What is so special about 33fps? I feel like it should
at least mention that an real-world use case requires 33fps (was it
really 33 not 30?)
The patch comment already includes the workload description: game-rendering + encoding at 30 fps (not as fast as
possible but with latency deadlines at that fps). But i can include it again in this #define if you prefer:
(would this work?)
      /*
       * This default value of 33 milisecs (+1 milisec buffer) ensures 33fps or higher fps
       * workloads are able to enjoy the latency reduction when delaying the schedule-disable
       * operation. This matches the 30fps game-render + encode (real world) workload this
       * knob was tested against.
       */
      #define SCHED_DISABLE_DELAY_MS	34
Sounds plausible. Although technically, 34 is not a +1 buffer it is 
rounding up the 1/3ms that we can't fit in an integer variable. And it 
should be "30fps or higher" given that the actual workload is 30fps. 
Supporting only >= 33fps would be insufficient.


and that anything faster (e.g. 60fps) will automatically be covered.
Is this really necessary to state? I think that is an obvious inference if we know anything about fps and periods.
Fair enough.

And that in the other direction, ~30ms is not
so slow that it leads to large numbers of idle contexts floating around.
Did we have any specific reasons against going larger? I recall the
original experiments were with 100ms. Was there a specific reason for
rejection that value?
In some initial internal conversations, there was some concern about keeping contexts pinned and possible impact the GuC
subsystem power residency usage - but that was later discounted. So at this point, we really can keep any number we want
but since we already did the work testing for both 100ms and 34 ms, I thought it would be better to stick with 34 ms
simply because of its clear relationship to the rate of context submission for the actual workload that was tested (i.e.
30 fps workloads that were getting submitted at 33.333 milisecs apart). That said, would above comment mod work?
Okay. Sounds good.

John.