Re: [PATCH 1/3] drm/xe/vm: prevent UAF with asid based lookup

Matthew Auld <matthew.auld@xxxxxxxxx> · Fri, 12 Apr 2024 15:42:00 +0100

On 12/04/2024 15:06, Lucas De Marchi wrote:
On Fri, Apr 12, 2024 at 12:31:45PM +0100, Matthew Auld wrote:
The asid is only erased from the xarray when the vm refcount reaches
zero, however this leads to potential UAF since the xe_vm_get() only

I'm not sure I understand the call chain an where xe_vm_get() is coming
into play here.


works on a vm with refcount != 0. Since the asid is allocated in the vm
create ioctl, rather erase it when closing the vm, prior to dropping the
potential last ref. This should also work when user closes driver fd
without explicit vm destroy.

what seems weird is that you are moving it earlier in the call stack
rather than later, outside of the worker, to prevent the UAF.

what exactly was the UAF on?

UAF on the vm object. From the bug report it's when servicing some GPU 
fault, so inside handle_pagefault(). At this stage it just has some asid 
which is meant to map to some vm AFAICT. The lookup dance relies on 
calling xe_vm_get() after getting back the vm pointer from the xarray. 
Currently the asid is only erased from the xarray in 
vm_destroy_work_func() which is long after the refcount reaches zero and 
we are about to free the memory for the vm.

However xe_vm_get() is only meant to be called if you are already 
holding a ref, so if the vm refcount is already zero it just throws an 
error and continues on, and the caller has no idea. If that happens then 
as soon as we drop the usm lock the memory can be freed and it's game 
over. This looks to be what happens with the vm refcount reaching zero, 
and the handle_pagefault() still being able to reach the 
soon-to-be-freed vm via the xarray. With this patch we now erase from 
the xarray before we drop what is potentially the final ref. That way if 
you can reach the vm via the xarray you should always be able get a 
valid ref.


Lucas De Marchi


Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594
Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a196dbe65252..c5c26b3d1b76 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1581,6 +1581,16 @@ void xe_vm_close_and_put(struct xe_vm *vm)
        xe->usm.num_vm_in_fault_mode--;
    else if (!(vm->flags & XE_VM_FLAG_MIGRATION))
        xe->usm.num_vm_in_non_fault_mode--;
+
+    if (vm->usm.asid) {
+        void *lookup;
+
+        xe_assert(xe, xe->info.has_asid);
+        xe_assert(xe, !(vm->flags & XE_VM_FLAG_MIGRATION));
+
+        lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
+        xe_assert(xe, lookup == vm);
+    }
    mutex_unlock(&xe->usm.lock);

    for_each_tile(tile, xe, id)
@@ -1596,24 +1606,15 @@ static void vm_destroy_work_func(struct 
work_struct *w)
    struct xe_device *xe = vm->xe;
    struct xe_tile *tile;
    u8 id;
-    void *lookup;

    /* xe_vm_close_and_put was not called? */
    xe_assert(xe, !vm->size);

    mutex_destroy(&vm->snap_mutex);

-    if (!(vm->flags & XE_VM_FLAG_MIGRATION)) {
+    if (!(vm->flags & XE_VM_FLAG_MIGRATION))
        xe_device_mem_access_put(xe);

-        if (xe->info.has_asid && vm->usm.asid) {
-            mutex_lock(&xe->usm.lock);
-            lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
-            xe_assert(xe, lookup == vm);
-            mutex_unlock(&xe->usm.lock);
-        }
-    }
-
    for_each_tile(tile, xe, id)
        XE_WARN_ON(vm->pt_root[id]);

--
2.44.0