Quoting Tvrtko Ursulin (2020-02-05 18:33:57) > > On 05/02/2020 16:44, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2020-02-05 16:22:34) > >> On 05/02/2020 12:13, Chris Wilson wrote: > >>> If we have a set of active engines marked as being non-persistent, we > >>> lose track of those if the user replaces those engines with > >>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that > >>> non-persistent requests are terminated if they are no longer being > >>> tracked by the user's context (in order to prevent a lost request > >>> causing an untracked and so unstoppable GPU hang), we need to apply the > >>> same context cancellation upon changing engines. > >>> > >>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional") > >>> Testcase: XXX > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++ > >>> 1 file changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >>> index 52a749691a8d..20f1d3e0221f 100644 > >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx, > >>> > >>> replace: > >>> mutex_lock(&ctx->engines_mutex); > >>> + > >>> + /* Flush stale requests off the old engines if required */ > >>> + if (!i915_gem_context_is_persistent(ctx) || > >>> + !i915_modparams.enable_hangcheck) > >>> + kill_context(ctx); > >> > >> Is the negative effect of this is legit contexts can't keep submitting > >> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but > >> still. Might break legitimate userspace. Not that I offer solutions.. :( > >> Banning changing engines once context went non-persistent? That too can > >> break someone. > > > > It closes the hole we have. To do otherwise, we need to keep track of > > the old engines. Not an impossible task, certainly inconvenient. > > > > struct old_engines { > > struct i915_active active; > > struct list_head link; > > struct i915_gem_context *ctx; > > void *engines; > > int num_engines; > > }; > > > > With a list+spinlock in the ctx that we can work in kill_context. > > > > The biggest catch there is actually worrying about attaching the active > > to already executing request, and making sure the coupling doesn't bug > > on a concurrent completion. Hmm, it's just a completion callback, but > > more convenient to use a ready made one. > > What would you do with old engines? We don't have a mechanism to mark > intel_context closed. Hm, right, it would get unreachable by definition. > But how to terminate it if it doesn't play nicely? Wait 30 minutes (if it passes the tests) and you'll find out :) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx