On Mon, Aug 09, 2021 at 08:47:14PM +0200, Daniel Vetter wrote: > On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote: > > > > On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > > >> gem context refcounting is another exercise in least locking design it > >> seems, where most things get destroyed upon context closure (which can > >> race with anything really). Only the actual memory allocation and the > >> locks survive while holding a reference. > >> > >> This tripped up Jason when reimplementing the single timeline feature > >> in > >> > >> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d > >> Author: Jason Ekstrand <jason@xxxxxxxxxxxxxx> > >> Date: Thu Jul 8 10:48:12 2021 -0500 > >> > >> drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) > >> > >> We could fix the bug by holding ctx->mutex, but it's cleaner to just > > > > > > What bug is this fixing, exactly? > > Oh lol I was all busy ranting and not explaining :-) I found it while > auditing other context stuff, so that other patch has the longer > commit message with more history, but that patch is also now tied into > the vm-dercuify, so short summary: You put the syncobj in context > close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final > kref_put. Which means you're open to a use-after-free if you race > against an execbuf. ctx->vm is equally broken (but for other ioctl), > once this fix here is merged I send out the ctx->vm fix because that's > tied into the vm-dercuify now due to conflicts. CI caught more, so just explaining what I'm fixing here isn't going to be enough. The troubel is that drm_syncobj_put is now called from very awkward places, and I need to see whether we can fix that. Or whether we need more work_struct (like we have already for i915_address_space) for the final release. -Daniel > -Daniel > > > > > --Jason > > > >> > >> make the context object actually invariant over its _entire_ lifetime. > >> > >> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > >> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") > >> Cc: Jason Ekstrand <jason@xxxxxxxxxxxxxx> > >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > >> Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > >> Cc: Matthew Auld <matthew.auld@xxxxxxxxx> > >> Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > >> Cc: "Thomas Hellström" <thomas.hellstrom@xxxxxxxxx> > >> Cc: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> > >> Cc: Dave Airlie <airlied@xxxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >> index 754b9b8d4981..93ba0197d70a 100644 > >> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) > >> trace_i915_context_free(ctx); > >> GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); > >> > >> + if (ctx->syncobj) > >> + drm_syncobj_put(ctx->syncobj); > >> + > >> mutex_destroy(&ctx->engines_mutex); > >> mutex_destroy(&ctx->lut_mutex); > >> > >> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) > >> if (vm) > >> i915_vm_close(vm); > >> > >> - if (ctx->syncobj) > >> - drm_syncobj_put(ctx->syncobj); > >> - > >> ctx->file_priv = ERR_PTR(-EBADF); > >> > >> /* > >> -- > >> 2.32.0 > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch