Hi Nirmoy, On Mon, Jun 05, 2023 at 10:10:21PM +0200, Nirmoy Das wrote: > Ensure correct handling of closed VMAs on multi-gt platforms to prevent > Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are > exclusively added to GT0's closed_vma link (gt->closed_vma) and > subsequently freed by i915_vma_parked(), which assumes the entire GPU is > idle. However, on platforms with multiple GTs, such as MTL, GT1 may > remain active while GT0 is idle. This causes GT0 to mistakenly consider > the closed VMAs in its closed_vma list as unnecessary, potentially > leading to Use-After-Free issues if a job for GT1 attempts to access a > freed VMA. > > Although we do take a wakeref for GT0 but it happens later, after > evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref > early. hooray! this is great, Nirmoy! I will give it a shot. > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > Cc: Chris Wilson <chris.p.wilson@xxxxxxxxx> > Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> > Cc: Andrzej Hajda <andrzej.hajda@xxxxxxxxx> > Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx> > --- > drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > index 5fb459ea4294..adcf8837dfe6 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > @@ -2692,6 +2692,7 @@ static int > eb_select_engine(struct i915_execbuffer *eb) > { > struct intel_context *ce, *child; > + struct intel_gt *gt; > unsigned int idx; > int err; > > @@ -2715,10 +2716,16 @@ eb_select_engine(struct i915_execbuffer *eb) > } > } > eb->num_batches = ce->parallel.number_children + 1; > + gt = ce->engine->gt; > > for_each_child(ce, child) > intel_context_get(child); > intel_gt_pm_get(ce->engine->gt); > + /* Keep GT0 active on MTL so that i915_vma_parked() doesn't > + * free VMAs while execbuf ioctl is validating VMAs. > + */ > + if (gt != to_gt(gt->i915)) you can use gt->info.id > + intel_gt_pm_get(to_gt(ce->engine->gt->i915)); > > if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) { > err = intel_context_alloc_state(ce); > @@ -2757,6 +2764,9 @@ eb_select_engine(struct i915_execbuffer *eb) > return err; > > err: > + if (ce->engine->gt != to_gt(ce->engine->gt->i915)) if (gt->info.id) gt is already ce->engine->gt > + intel_gt_pm_get(to_gt(ce->engine->gt->i915)); > + > intel_gt_pm_put(ce->engine->gt); > for_each_child(ce, child) > intel_context_put(child); > @@ -2770,6 +2780,8 @@ eb_put_engine(struct i915_execbuffer *eb) > struct intel_context *child; > > i915_vm_put(eb->context->vm); > + if (eb->gt != to_gt(eb->gt->i915)) > + intel_gt_pm_put(to_gt(eb->gt->i915)); this wakeref going up and down is a bit ugly... Perhaps we can add some flag about the GT type in the info structure. MTL is a weird multi-gt platform and, indeed, you can't shut down GT0 without affecting GT1. For now it's OK, though, as to test it. Andi > intel_gt_pm_put(eb->gt); > for_each_child(eb->context, child) > intel_context_put(child); > -- > 2.39.0