Re: [PATCH 07/25] drm/i915: Cancel context if it hangs after it is closed

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 11 Nov 2019 11:04:33 +0000

Quoting Mika Kuoppala (2019-11-11 10:54:14)
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
> 
> > If we detect a hang in a closed context, just flush all of its requests
> > and cancel any remaining execution along the context. Note that after
> > closing the context, the last reference to the context may be dropped,
> > leaving it only valid under RCU.
> 
> Sound good. But is there a window for userspace to start
> to see -EIO if it resubmits to a closed context?

Userspace can not submit to a closed context (-ENOENT) as that would be
tantamount to a use-after-free kernel bug.

> In other words, after userspace doing gem_ctx_destroy(ctx_handle),
> we would return -EINVAL due to ctx_handle being stale
> earlier than we check for banned status and return -EIO?

It's as simple as if the context is closed, it is removed from the
file->context_idr and userspace cannot access it. If userspace is racing
with itself, there's not much we can do other than protect our
references. If userspace succeeds in submitting to the context prior to
closing it in another thread, it has the context to continue (and if
then hangs, it will be shot down immediately). If it loses that race, it
gets an -ENOENT. If it loses that race so badly the context id is
replace by a new context, it submits to that new context; which surely
will end in tears and GPU hangs, but not our fault and nothing we can do
to prevent that.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx