On 05/02/2020 16:44, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
On 05/02/2020 12:13, Chris Wilson wrote:
If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.
Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: XXX
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..20f1d3e0221f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
replace:
mutex_lock(&ctx->engines_mutex);
+
+ /* Flush stale requests off the old engines if required */
+ if (!i915_gem_context_is_persistent(ctx) ||
+ !i915_modparams.enable_hangcheck)
+ kill_context(ctx);
Is the negative effect of this is legit contexts can't keep submitting
and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
still. Might break legitimate userspace. Not that I offer solutions.. :(
Banning changing engines once context went non-persistent? That too can
break someone.
It closes the hole we have. To do otherwise, we need to keep track of
the old engines. Not an impossible task, certainly inconvenient.
struct old_engines {
struct i915_active active;
struct list_head link;
struct i915_gem_context *ctx;
void *engines;
int num_engines;
};
With a list+spinlock in the ctx that we can work in kill_context.
The biggest catch there is actually worrying about attaching the active
to already executing request, and making sure the coupling doesn't bug
on a concurrent completion. Hmm, it's just a completion callback, but
more convenient to use a ready made one.
What would you do with old engines? We don't have a mechanism to mark
intel_context closed. Hm, right, it would get unreachable by definition.
But how to terminate it if it doesn't play nicely?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx