Re: [RFC PATCH] drm/sched: Fix a UAF on drm_sched_fence::sched

Christian König <christian.koenig@xxxxxxx> · Wed, 4 Sep 2024 09:40:36 +0200

    Am 03.09.24 um 10:13 schrieb Simona Vetter:

    [SNIP]

          So I think the issue is much, much bigger, and there's more. And the
issue is I think a fundamental design issue of dma_fence itself, not
individual users.

        IIRC both Alex and me pointed out this issue on the very first dma_fence
code and nobody really cared.

      I guess way back then we didn't really sort out any of the hotunplug
issues, and there wasn't any fw ctx schedulers at least on our horizons
yet. Thin excuse, I know ...

    Well it's just when you have a bee string and a broken leg, what do
    you attend first? :)

            I think at the core it's two constraints:

- dma_fence can stick around practically forever in varios container
   objects. We only garbage collect when someone looks, and not even then
   consistently.

- fences are meant to be cheap, so they do not have the big refcount going
   on like other shared objects like dma_buf

Specifically there's also no refcounting on the module itself with the
->owner and try_module_get stuff. So even if we fix all these issues on
the data structure lifetime side of things, you might still oops calling
into dma_fence->ops->release.

Oops.

        Yes, exactly that. I'm a bit surprised that you realize that only now :)

We have the issue for at least 10 years or so and it pops up every now and
then on my desk because people complain that unloading amdgpu crashes.

      Yeah I knew about the issue. The new idea that popped into my mind is that
I think we cannot plug this properly unless we do it in dma_fence.c for
everyone, and essentially reshape the lifetime rules for that from yolo
to something actually well-defined.

Kinda similar work to how dma_resv locking rules and fence book-keeping
were unified to something that actually works across drivers ...

    Well sounds like I've just got more items on my TODO list.

    I have patches waiting to be send out going into this direction
    anyway, will try to get them out by the end of the week and then we
    can discuss what's still missing.

    Christian.

          I think the complete solution is if we change this code all so that core
dma-fence.c code guarantees to never ever again call into any driver code
after dma_fence_signal has been called, and takes over the final kfree_rcu
itself. But that's a giantic change. But I think it's the only way to
really fix this mess:

- drivers will clean up any of their own references in a timely fashion,
   so no more accidentally lingering gpu context or vms and the bo they
   have mapped lying around.

- there's no lifetime or other use-after-free issues anywhere for fences
   anymore

Downside is that some of the debugging stuff becomes a bit less useful.
But e.g. tracepoints could just dump the timeline once at creation or when
signalling, and so you don't need to dump it anymore when freeing. And a
signalled fence is generally not a problem anymore, so in a compositor
that's also all fine (iirc you can get at some of this stuff through the
sync_file interfaces too).

The other downside is that it's a huge pile of work, but I don't think we
can get to an actually solid design with less headaches and pain ...

Thoughts?

        The alternative is to use the scheduler fence(s) to decouple hardware fences
from the containers. That would be rather cheap to implement.

The only downside would be that the scheduler module probably keeps loaded
forever once used. But at least I can live with that.

      Yeah I think interim it's an ok stop-gap. But aside from keeping the
scheduler code pinned forever I think there's some more things:

- I'm not sure we can do it, without digging into dma_fence.c locking
  internals too much.

- It defacto means you can use dma_fence that are fence containers and
  drm_sched_job_fence, and nothing else. And drivers will get this wrong
  and do dma_fence ad-hoc for stuff like tlb flushing, or pte writing, and
  whatever else, that won't necessairly go through a drm_sched.

So not great imo, and hence why I've shifted towards that we should fix
this in dma_fence.c code for everyone.
-Sima