On 9/29/2022 00:42, Tvrtko Ursulin wrote:
On 29/09/2022 03:18, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>
Compute workloads are inherently not pre-emptible for long periods on
current hardware. As a workaround for this, the pre-emption timeout
for compute capable engines was disabled. This is undesirable with GuC
submission as it prevents per engine reset of hung contexts. Hence the
next patch will re-enable the timeout but bumped up by an order of
magnitude.
However, the heartbeat might not respect that. Depending upon current
activity, a pre-emption to the heartbeat pulse might not even be
attempted until the last heartbeat period. Which means that only one
period is granted for the pre-emption to occur. With the aforesaid
bump, the pre-emption timeout could be significantly larger than this
heartbeat period.
So adjust the heartbeat code to take the pre-emption timeout into
account. When it reaches the final (high priority) period, it now
ensures the delay before hitting reset is bigger than the pre-emption
timeout.
v2: Fix for selftests which adjust the heartbeat period manually.
Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
---
.../gpu/drm/i915/gt/intel_engine_heartbeat.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a3698f611f457..823a790a0e2ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -22,9 +22,28 @@
static bool next_heartbeat(struct intel_engine_cs *engine)
{
+ struct i915_request *rq;
long delay;
delay = READ_ONCE(engine->props.heartbeat_interval_ms);
+
+ rq = engine->heartbeat.systole;
+
+ if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
+ delay == engine->defaults.heartbeat_interval_ms) {
Maybe I forgot but what is the reason for the check against the
default heartbeat interval?
That's the 'v2: fix for selftests that manually adjust the heartbeat'.
If something (or someone) has explicitly set an override of the
heartbeat then it has to be assumed that they know what they are doing,
and if things don't work any more that's their problem. But if we don't
respect their override then they won't get the timings they expect and
the selftest will fail.
John.
Regards,
Tvrtko
+ long longer;
+
+ /*
+ * The final try is at the highest priority possible. Up
until now
+ * a pre-emption might not even have been attempted. So make
sure
+ * this last attempt allows enough time for a pre-emption to
occur.
+ */
+ longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2;
+ longer = intel_clamp_heartbeat_interval_ms(engine, longer);
+ if (longer > delay)
+ delay = longer;
+ }
+
if (!delay)
return false;