On Wed, Aug 14, 2024 at 12:01:30PM +0100, Matthew Auld wrote: > The fence lock is part of the queue, therefore in the current design > anything locking the fence should then also hold a ref to the queue to > prevent the queue from being freed. > > However, currently it looks like we signal the fence and then drop the > queue ref, but if something is waiting on the fence, the waiter is > kicked to wake up at some later point, where upon waking up it first > grabs the lock before checking the fence state. But if we have already > dropped the queue ref, then the lock might already be freed as part of > the queue, leading to uaf. > > To prevent this, move the fence lock into the fence itself so we don't > run into lifetime issues. Alternative might be to have device level > lock, or only release the queue in the fence release callback, however > that might require pushing to another worker to avoid locking issues. > > Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") > References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2454 > References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2342 > References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2020 > Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> Good catch. This is indeed a problem, and you have it coded probably in the safest way possible. With that: Reviewed-by: Matthew Brost <matthew.brost@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v6.8+ > --- > drivers/gpu/drm/xe/xe_exec_queue.c | 1 - > drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 -- > drivers/gpu/drm/xe/xe_preempt_fence.c | 3 ++- > drivers/gpu/drm/xe/xe_preempt_fence_types.h | 2 ++ > 4 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c > index 971e1234b8ea..0f610d273fb6 100644 > --- a/drivers/gpu/drm/xe/xe_exec_queue.c > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c > @@ -614,7 +614,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data, > > if (xe_vm_in_preempt_fence_mode(vm)) { > q->lr.context = dma_fence_context_alloc(1); > - spin_lock_init(&q->lr.lock); > > err = xe_vm_add_compute_exec_queue(vm, q); > if (XE_IOCTL_DBG(xe, err)) > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h > index 1408b02eea53..fc2a1a20b7e4 100644 > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h > @@ -126,8 +126,6 @@ struct xe_exec_queue { > u32 seqno; > /** @lr.link: link into VM's list of exec queues */ > struct list_head link; > - /** @lr.lock: preemption fences lock */ > - spinlock_t lock; > } lr; > > /** @ops: submission backend exec queue operations */ > diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c > index 56e709d2fb30..83fbeea5aa20 100644 > --- a/drivers/gpu/drm/xe/xe_preempt_fence.c > +++ b/drivers/gpu/drm/xe/xe_preempt_fence.c > @@ -134,8 +134,9 @@ xe_preempt_fence_arm(struct xe_preempt_fence *pfence, struct xe_exec_queue *q, > { > list_del_init(&pfence->link); > pfence->q = xe_exec_queue_get(q); > + spin_lock_init(&pfence->lock); > dma_fence_init(&pfence->base, &preempt_fence_ops, > - &q->lr.lock, context, seqno); > + &pfence->lock, context, seqno); > > return &pfence->base; > } > diff --git a/drivers/gpu/drm/xe/xe_preempt_fence_types.h b/drivers/gpu/drm/xe/xe_preempt_fence_types.h > index b54b5c29b533..312c3372a49f 100644 > --- a/drivers/gpu/drm/xe/xe_preempt_fence_types.h > +++ b/drivers/gpu/drm/xe/xe_preempt_fence_types.h > @@ -25,6 +25,8 @@ struct xe_preempt_fence { > struct xe_exec_queue *q; > /** @preempt_work: work struct which issues preemption */ > struct work_struct preempt_work; > + /** @lock: dma-fence fence lock */ > + spinlock_t lock; > /** @error: preempt fence is in error state */ > int error; > }; > -- > 2.46.0 >