Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Quoting Mika Kuoppala (2019-11-11 10:54:14) >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: >> >> > If we detect a hang in a closed context, just flush all of its requests >> > and cancel any remaining execution along the context. Note that after >> > closing the context, the last reference to the context may be dropped, >> > leaving it only valid under RCU. >> >> Sound good. But is there a window for userspace to start >> to see -EIO if it resubmits to a closed context? > > Userspace can not submit to a closed context (-ENOENT) as that would be > tantamount to a use-after-free kernel bug. > >> In other words, after userspace doing gem_ctx_destroy(ctx_handle), >> we would return -EINVAL due to ctx_handle being stale >> earlier than we check for banned status and return -EIO? > > It's as simple as if the context is closed, it is removed from the > file->context_idr and userspace cannot access it. If userspace is racing > with itself, there's not much we can do other than protect our > references. If userspace succeeds in submitting to the context prior to > closing it in another thread, it has the context to continue (and if > then hangs, it will be shot down immediately). If it loses that race, it > gets an -ENOENT. If it loses that race so badly the context id is > replace by a new context, it submits to that new context; which surely > will end in tears and GPU hangs, but not our fault and nothing we can do > to prevent that. Let them shed tears if they bring it on themselves. I was concerned on a behavioural change on close/resubmit race. But as you explained racing on a different id, they deserve what they begged for. We are in a business of protecting the state of all the sane ones. Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx