On Fri, Apr 12, 2024 at 03:42:00PM +0100, Matthew Auld wrote: > On 12/04/2024 15:06, Lucas De Marchi wrote: > > On Fri, Apr 12, 2024 at 12:31:45PM +0100, Matthew Auld wrote: > > > The asid is only erased from the xarray when the vm refcount reaches > > > zero, however this leads to potential UAF since the xe_vm_get() only > > > > I'm not sure I understand the call chain an where xe_vm_get() is coming > > into play here. > > > > > > > works on a vm with refcount != 0. Since the asid is allocated in the vm > > > create ioctl, rather erase it when closing the vm, prior to dropping the > > > potential last ref. This should also work when user closes driver fd > > > without explicit vm destroy. > > > > what seems weird is that you are moving it earlier in the call stack > > rather than later, outside of the worker, to prevent the UAF. > > > > what exactly was the UAF on? > > UAF on the vm object. From the bug report it's when servicing some GPU > fault, so inside handle_pagefault(). At this stage it just has some asid > which is meant to map to some vm AFAICT. The lookup dance relies on calling > xe_vm_get() after getting back the vm pointer from the xarray. Currently the > asid is only erased from the xarray in vm_destroy_work_func() which is long > after the refcount reaches zero and we are about to free the memory for the > vm. > > However xe_vm_get() is only meant to be called if you are already holding a > ref, so if the vm refcount is already zero it just throws an error and > continues on, and the caller has no idea. If that happens then as soon as we > drop the usm lock the memory can be freed and it's game over. This looks to > be what happens with the vm refcount reaching zero, and the > handle_pagefault() still being able to reach the soon-to-be-freed vm via the > xarray. With this patch we now erase from the xarray before we drop what is > potentially the final ref. That way if you can reach the vm via the xarray > you should always be able get a valid ref. > This explaination makes sense to me. Thanks for the fix. Reviewed-by: Matthew.brost@xxxxxxxxx> > > > > Lucas De Marchi > > > > > > > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594 > > > Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > > > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > > > Cc: <stable@xxxxxxxxxxxxxxx> # v6.8+ > > > --- > > > drivers/gpu/drm/xe/xe_vm.c | 21 +++++++++++---------- > > > 1 file changed, 11 insertions(+), 10 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > > > index a196dbe65252..c5c26b3d1b76 100644 > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > @@ -1581,6 +1581,16 @@ void xe_vm_close_and_put(struct xe_vm *vm) > > > xe->usm.num_vm_in_fault_mode--; > > > else if (!(vm->flags & XE_VM_FLAG_MIGRATION)) > > > xe->usm.num_vm_in_non_fault_mode--; > > > + > > > + if (vm->usm.asid) { > > > + void *lookup; > > > + > > > + xe_assert(xe, xe->info.has_asid); > > > + xe_assert(xe, !(vm->flags & XE_VM_FLAG_MIGRATION)); > > > + > > > + lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid); > > > + xe_assert(xe, lookup == vm); > > > + } > > > mutex_unlock(&xe->usm.lock); > > > > > > for_each_tile(tile, xe, id) > > > @@ -1596,24 +1606,15 @@ static void vm_destroy_work_func(struct > > > work_struct *w) > > > struct xe_device *xe = vm->xe; > > > struct xe_tile *tile; > > > u8 id; > > > - void *lookup; > > > > > > /* xe_vm_close_and_put was not called? */ > > > xe_assert(xe, !vm->size); > > > > > > mutex_destroy(&vm->snap_mutex); > > > > > > - if (!(vm->flags & XE_VM_FLAG_MIGRATION)) { > > > + if (!(vm->flags & XE_VM_FLAG_MIGRATION)) > > > xe_device_mem_access_put(xe); > > > > > > - if (xe->info.has_asid && vm->usm.asid) { > > > - mutex_lock(&xe->usm.lock); > > > - lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid); > > > - xe_assert(xe, lookup == vm); > > > - mutex_unlock(&xe->usm.lock); > > > - } > > > - } > > > - > > > for_each_tile(tile, xe, id) > > > XE_WARN_ON(vm->pt_root[id]); > > > > > > -- > > > 2.44.0 > > >