Re: [RFC PATCH] drm/sched: Fix a UAF on drm_sched_fence::sched

Christian König <christian.koenig@xxxxxxx> · Mon, 2 Sep 2024 12:43:45 +0200

Am 30.08.24 um 23:43 schrieb Matthew Brost:

    
      On Fri, Aug 30, 2024 at 10:14:18AM +0200, Christian König wrote:

      
        Am 29.08.24 um 19:12 schrieb Boris Brezillon:

        
          dma_fence objects created by an entity might outlive the
drm_gpu_scheduler this entity was bound to if those fences are retained
by other other objects, like a dma_buf resv. This means that
drm_sched_fence::sched might be invalid when the resv is walked, which
in turn leads to a UAF when dma_fence_ops::get_timeline_name() is called.

This probably went unnoticed so far, because the drm_gpu_scheduler had
the lifetime of the drm_device, so, unless you were removing the device,
there were no reasons for the scheduler to be gone before its fences.

        
        Nope, that is intentional design. get_timeline_name() is not safe to be
called after the fence signaled because that would causes circular
dependency problems.


      
      I'm quite sure happens, ftrace for example can and will call
get_timeline_name in trace_dma_fence_destroy which is certainly after
the fence is signaled. There are likely other cases too - this just
quickly came to mind.
    
    

    Good point, completely forgotten about ftrace.

    

    

    
      
        E.g. when you have hardware fences it can happen that fences reference a
driver module (for the function printing the name) and the module in turn
keeps fences around.


      
      I am almost positive without this patch this problematic in Xe or any
driver in which schedulers are tied to IOCTLs rather than kernel module.

In Xe 'fence->sched' maps to an xe_exec_queue which can be freed once
the destroy exec queue IOCTL is called and all jobs are free'd (i.e.
'fence' signals). The fence could be live on after in dma-resv objects,
drm syncobjs, etc...

I know this issue has been raised before and basically NACK'd but I have
a strong opinion this is valid and in fact required.
    
    

    I've NACK'd automatically signaling pending fences on destruction of
    the scheduler (that reminds me that I wanted to add a warning for
    that) and copying the name into every scheduler fence.

    

    As long as we don't do any of that I'm perfectly fine fixing this
    issue. The approach of creating a reference counted object for the
    name looks rather valid to me.

    

    Especially since we then pretty much get the module references
    correct for free as well.

    

    Christian.

    

    
      
Matt


      
        So you easily end up with a module you can never unload.



        
          With the introduction of a new model where each entity has its own
drm_gpu_scheduler instance, this situation is likely to happen every time
a GPU context is destroyed and some of its fences remain attached to
dma_buf objects still owned by other drivers/processes.

In order to make drm_sched_fence_get_timeline_name() safe, we need to
copy the scheduler name into our own refcounted object that's only
destroyed when both the scheduler and all its fences are gone.

The fact drm_sched_fence might have a reference to the drm_gpu_scheduler
even after it's been released is worrisome though, but I'd rather
discuss that with everyone than come up with a solution that's likely
to end up being rejected.

Note that the bug was found while repeatedly reading dma_buf's debugfs
file, which, at some point, calls dma_resv_describe() on a resv that
contains signalled fences coming from a destroyed GPU context.
AFAIK, there's nothing invalid there.

        
        Yeah but reading debugfs is not guaranteed to crash the kernel.

On the other hand the approach with a kref'ed string looks rather sane to
me. One comment on this below.


        
          Cc: Luben Tuikov <ltuikov89@xxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: "Christian König" <christian.koenig@xxxxxxx>
Cc: Danilo Krummrich <dakr@xxxxxxxxxx>
Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
---
  drivers/gpu/drm/scheduler/sched_fence.c |  8 +++-
  drivers/gpu/drm/scheduler/sched_main.c  | 18 ++++++++-
  include/drm/gpu_scheduler.h             | 52 +++++++++++++++++++++++++
  3 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 0f35f009b9d3..efa2a71d98eb 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence)
  static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
  {
  	struct drm_sched_fence *fence = to_drm_sched_fence(f);
-	return (const char *)fence->sched->name;
+	return (const char *)fence->timeline->name;
  }
  static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
@@ -112,8 +112,10 @@ static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
   */
  void drm_sched_fence_free(struct drm_sched_fence *fence)
  {
+	drm_sched_fence_timeline_put(fence->timeline);
+
  	/* This function should not be called if the fence has been initialized. */
-	if (!WARN_ON_ONCE(fence->sched))
+	if (!WARN_ON_ONCE(fence->timeline))
  		kmem_cache_free(sched_fence_slab, fence);
  }
@@ -224,6 +226,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
  	unsigned seq;
  	fence->sched = entity->rq->sched;
+	fence->timeline = fence->sched->fence_timeline;
+	drm_sched_fence_timeline_get(fence->timeline);
  	seq = atomic_inc_return(&entity->fence_seq);
  	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
  		       &fence->lock, entity->fence_context, seq);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..1e31a9c8ce15 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1288,10 +1288,21 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
  		sched->own_submit_wq = true;
  	}
+	sched->fence_timeline = kzalloc(sizeof(*sched->fence_timeline), GFP_KERNEL);
+	if (!sched->fence_timeline)
+		goto Out_check_own;
+
+	sched->fence_timeline->name = kasprintf(GFP_KERNEL, "%s", sched->name);
+	if (!sched->fence_timeline->name)
+		goto Out_free_fence_timeline;
+
+	kref_init(&sched->fence_timeline->kref);
+
  	sched->sched_rq = kmalloc_array(num_rqs, sizeof(*sched->sched_rq),
  					GFP_KERNEL | __GFP_ZERO);
  	if (!sched->sched_rq)
-		goto Out_check_own;
+		goto Out_free_fence_timeline;
+
  	sched->num_rqs = num_rqs;
  	for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
  		sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
@@ -1319,6 +1330,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
  	kfree(sched->sched_rq);
  	sched->sched_rq = NULL;
+Out_free_fence_timeline:
+	if (sched->fence_timeline)
+		kfree(sched->fence_timeline->name);
+	kfree(sched->fence_timeline);
  Out_check_own:
  	if (sched->own_submit_wq)
  		destroy_workqueue(sched->submit_wq);
@@ -1367,6 +1382,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
  	sched->ready = false;
  	kfree(sched->sched_rq);
  	sched->sched_rq = NULL;
+	drm_sched_fence_timeline_put(sched->fence_timeline);
  }
  EXPORT_SYMBOL(drm_sched_fini);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5acc64954a88..615ca89f77dc 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -261,6 +261,52 @@ struct drm_sched_rq {
  	struct rb_root_cached		rb_tree_root;
  };
+/**
+ * struct drm_sched_fence_timeline - Wrapped around the timeline name
+ *
+ * This is needed to cope with the fact dma_fence objects created by
+ * an entity might outlive the drm_gpu_scheduler this entity was bound
+ * to, making drm_sched_fence::sched invalid and leading to a UAF when
+ * dma_fence_ops::get_timeline_name() is called.
+ */
+struct drm_sched_fence_timeline {
+	/** @kref: Reference count of this timeline object. */
+	struct kref			kref;
+
+	/**
+	 * @name: Name of the timeline.
+	 *
+	 * This is currently a copy of drm_gpu_scheduler::name.
+	 */
+	const char			*name;

        
        Make that a char name[] and embed the name into the structure. The macro
struct_size() can be used to calculate the size.


        
          +};
+
+static inline void
+drm_sched_fence_timeline_release(struct kref *kref)
+{
+	struct drm_sched_fence_timeline *tl =
+		container_of(kref, struct drm_sched_fence_timeline, kref);
+
+	kfree(tl->name);
+	kfree(tl);

        
        This avoids having two allocations for the timeline name.

Regards,
Christian.


        
          +}
+
+static inline void
+drm_sched_fence_timeline_put(struct drm_sched_fence_timeline *tl)
+{
+	if (tl)
+		kref_put(&tl->kref, drm_sched_fence_timeline_release);
+}
+
+static inline struct drm_sched_fence_timeline *
+drm_sched_fence_timeline_get(struct drm_sched_fence_timeline *tl)
+{
+	if (tl)
+		kref_get(&tl->kref);
+
+	return tl;
+}
+
  /**
   * struct drm_sched_fence - fences corresponding to the scheduling of a job.
   */
@@ -289,6 +335,9 @@ struct drm_sched_fence {
  	 */
  	ktime_t				deadline;
+        /** @timeline: the timeline this fence belongs to. */
+	struct drm_sched_fence_timeline	*timeline;
+
          /**
           * @parent: the fence returned by &drm_sched_backend_ops.run_job
           * when scheduling the job on hardware. We signal the
@@ -488,6 +537,8 @@ struct drm_sched_backend_ops {
   * @credit_count: the current credit count of this scheduler
   * @timeout: the time after which a job is removed from the scheduler.
   * @name: name of the ring for which this scheduler is being used.
+ * @fence_timeline: fence timeline that will be passed to fences created by
+ *                  and entity that's bound to this scheduler.
   * @num_rqs: Number of run-queues. This is at most DRM_SCHED_PRIORITY_COUNT,
   *           as there's usually one run-queue per priority, but could be less.
   * @sched_rq: An allocated array of run-queues of size @num_rqs;
@@ -521,6 +572,7 @@ struct drm_gpu_scheduler {
  	atomic_t			credit_count;
  	long				timeout;
  	const char			*name;
+	struct drm_sched_fence_timeline	*fence_timeline;
  	u32                             num_rqs;
  	struct drm_sched_rq             **sched_rq;
  	wait_queue_head_t		job_scheduled;