On 25/09/2020 11:05, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-09-24 15:26:56)
On 16/09/2020 10:42, Chris Wilson wrote:
Verify that if a context is active at the time it is closed, that it is
either persistent and preemptible (with hangcheck running) or it shall
be removed from execution.
Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-close
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v5.7+
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 48 +++++----------------
1 file changed, 10 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index a548626fa8bc..4fd38101bb56 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -390,24 +390,6 @@ __context_engines_static(const struct i915_gem_context *ctx)
return rcu_dereference_protected(ctx->engines, true);
}
-static bool __reset_engine(struct intel_engine_cs *engine)
-{
- struct intel_gt *gt = engine->gt;
- bool success = false;
-
- if (!intel_has_reset_engine(gt))
- return false;
-
- if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
- >->reset.flags)) {
- success = intel_engine_reset(engine, NULL) == 0;
- clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
- >->reset.flags);
- }
-
- return success;
-}
-
static void __reset_context(struct i915_gem_context *ctx,
struct intel_engine_cs *engine)
{
@@ -431,12 +413,7 @@ static bool __cancel_engine(struct intel_engine_cs *engine)
* kill the banned context, we fallback to doing a local reset
* instead.
*/
- if (IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT) &&
- !intel_engine_pulse(engine))
- return true;
-
- /* If we are unable to send a pulse, try resetting this engine. */
- return __reset_engine(engine);
+ return intel_engine_pulse(engine) == 0;
Where is the path now which actually resets the currently running
workload (engine) of a non-persistent context? Pulse will be sent and
then rely on hangcheck but otherwise let it run?
If the pulse fails, we just call intel_handle_error() on the engine. On
looking at this code again, I could not justify the open-coding of the
engine reset fragment of the general error handler, especially as we end
up taking that route anyway for device resets. (Unlike inside the
tasklet, there's no atomicity concerns on this engine-reset path.)
I think yesterday I got lost in boolean logic and flows here. Today it
looks fine. Bool ban will be true and code will indeed enter the
__kill_context path.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Regards,
Tvrtko