Re: [PATCH v2] drm/i915: Fix a VMA UAF for multi-gt platform

Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> · Tue, 6 Jun 2023 22:56:02 +0200

Hi Nirmoy,

On Tue, Jun 06, 2023 at 10:27:55PM +0200, Nirmoy Das wrote:
> Ensure correct handling of closed VMAs on multi-gt platforms to prevent
> Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
> exclusively added to GT0's closed_vma link (gt->closed_vma) and
> subsequently freed by i915_vma_parked(), which assumes the entire GPU is
> idle. However, on platforms with multiple GTs, such as MTL, GT1 may
> remain active while GT0 is idle. This causes GT0 to mistakenly consider
> the closed VMAs in its closed_vma list as unnecessary, potentially
> leading to Use-After-Free issues if a job for GT1 attempts to access a
> freed VMA.
> 
> Although we do take a wakeref for GT0 but it happens later, after
> evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
> early.
> 
> v2: Use gt id to detect multi-tile(Andi)
>     Fix the incorrect error path.
> 
> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
> Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
> Cc: Chris Wilson <chris.p.wilson@xxxxxxxxx>
> Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
> Cc: Andrzej Hajda <andrzej.hajda@xxxxxxxxx>
> Cc: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@xxxxxxxxx>
> Tested-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
> Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx>

I wonder if we need any Fixes tag here, maybe this:

Fixes: d93939730347 ("drm/i915: Remove the vma refcount")

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 3aeede6aee4d..c2a67435acfa 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2683,6 +2683,7 @@ static int
>  eb_select_engine(struct i915_execbuffer *eb)
>  {
>  	struct intel_context *ce, *child;
> +	struct intel_gt *gt;
>  	unsigned int idx;
>  	int err;
>  
> @@ -2706,10 +2707,16 @@ eb_select_engine(struct i915_execbuffer *eb)
>  		}
>  	}
>  	eb->num_batches = ce->parallel.number_children + 1;
> +	gt = ce->engine->gt;
>  
>  	for_each_child(ce, child)
>  		intel_context_get(child);
>  	intel_gt_pm_get(ce->engine->gt);
> +	/* Keep GT0 active on MTL so that i915_vma_parked() doesn't
> +	 * free VMAs while execbuf ioctl is validating VMAs.
> +	 */
> +	if (gt->info.id)
> +		intel_gt_pm_get(to_gt(gt->i915));
>  
>  	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>  		err = intel_context_alloc_state(ce);
> @@ -2748,6 +2755,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>  	return err;
>  
>  err:
> +	if (gt->info.id)
> +		intel_gt_pm_put(to_gt(gt->i915));
> +
>  	intel_gt_pm_put(ce->engine->gt);
>  	for_each_child(ce, child)
>  		intel_context_put(child);
> @@ -2761,6 +2771,8 @@ eb_put_engine(struct i915_execbuffer *eb)
>  	struct intel_context *child;
>  
>  	i915_vm_put(eb->context->vm);
> +	if (eb->gt->info.id)
> +		intel_gt_pm_put(to_gt(eb->gt->i915));
>  	intel_gt_pm_put(eb->gt);

I would add a comment up here, just not to scare people when they
see this.

Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> 

Andi

>  	for_each_child(eb->context, child)
>  		intel_context_put(child);
> -- 
> 2.39.0