Quoting Chris Wilson (2020-09-16 12:42:17) > We only allow persistent requests to remain on the GPU past the closure > of their containing context (and process) so long as they are continuously > checked for hangs or allow other requests to preempt them, as we need to > ensure forward progress of the system. If we allow persistent contexts > to remain on the system after the the hangcheck mechanism is disabled, > the system may grind to a halt. On disabling the mechanism, we sent a > pulse along the engine to remove all executing contexts from the engine > which would check for hung contexts -- but we did not prevent those > contexts from being resubmitted if they survived the final hangcheck. > > Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") > Testcase: igt/gem_ctx_persistence/heartbeat-stop > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v5.7+ Definitely makes sense to ensure. Acked-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> Regards, Joonas > --- > drivers/gpu/drm/i915/gt/intel_engine.h | 9 +++++++++ > drivers/gpu/drm/i915/i915_request.c | 5 +++++ > 2 files changed, 14 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h > index 08e2c000dcc3..7c3a1012e702 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine.h > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h > @@ -337,4 +337,13 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) > return intel_engine_has_preemption(engine); > } > > +static inline bool > +intel_engine_has_heartbeat(const struct intel_engine_cs *engine) > +{ > + if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL)) > + return false; > + > + return READ_ONCE(engine->props.heartbeat_interval_ms); > +} > + > #endif /* _INTEL_RINGBUFFER_H_ */ > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 436ce368ddaa..0e813819b041 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -542,8 +542,13 @@ bool __i915_request_submit(struct i915_request *request) > if (i915_request_completed(request)) > goto xfer; > > + if (unlikely(intel_context_is_closed(request->context) && > + !intel_engine_has_heartbeat(engine))) > + intel_context_set_banned(request->context); > + > if (unlikely(intel_context_is_banned(request->context))) > i915_request_set_error_once(request, -EIO); > + > if (unlikely(fatal_error(request->fence.error))) > __i915_request_skip(request); > > -- > 2.20.1 >