Re: [PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Asahi Lina <lina@xxxxxxxxxxxxx> · Thu, 6 Apr 2023 18:27:27 +0900

On 06/04/2023 18.15, Asahi Lina wrote:
On 06/04/2023 18.06, Christian König wrote:
Am 06.04.23 um 10:49 schrieb Asahi Lina:
On 06/04/2023 17.29, Christian König wrote:
Am 05.04.23 um 18:34 schrieb Asahi Lina:
A signaled scheduler fence can outlive its scheduler, since fences are
independently reference counted.

Well that is actually not correct. Schedulers are supposed to stay
around until the hw they have been driving is no longer present.

But the fences can outlive that. You can GPU render into an imported
buffer, which attaches a fence to it. Then the GPU goes away but the
fence is still attached to the buffer. Then you oops when you cat that
debugfs file...

No, exactly that's the point you wouldn't ops.

My use case does this way more often (since schedulers are tied to
UAPI objects), which is how I found this, but as far as I can tell
this is already broken for all drivers on unplug/unbind/anything else
that would destroy the schedulers with fences potentially referenced
on separate scanout devices or at any other DMA-BUF consumer.

Even if a GPU is hot plugged the data structures for it should only go
away with the last reference, since the scheduler fence is referencing
the hw fence and the hw fence in turn is referencing the driver this
shouldn't happen.

So those fences can potentially keep half the driver data structures
alive just for existing in a signaled state? That doesn't seem sensible
(and it definitely doesn't work for our use case where schedulers can be
created and destroyed freely, it could lead to way too much junk
sticking around in memory).

E.g. the reference was scheduler_fence->hw_fence->driver->scheduler.

It's up to drivers not to mess that up, since the HW fence has the
same requirements that it can outlive other driver objects, just like
any other fence. That's not something the scheduler has to be
concerned with, it's a driver correctness issue.

Of course, in C you have to get it right yourself, while with correct
Rust abstractions will cause your code to fail to compile if you do it
wrong ^^

In my particular case, the hw_fence is a very dumb object that has no
references to anything, only an ID and a pending op count. Jobs hold
references to it and decrement it until it signals, not the other way
around. So that object can live forever regardless of whether the rest
of the device is gone.

That is then certainly a bug. This won't work that way, and the timelime
name is just the tip of the iceberg here.

The fence reference count needs to keep both the scheduler and driver
alive. Otherwise you could for example unload the module and immediately
ops because your fence_ops go away.

Yes, I realized the module issue after writing the email. But that's the
*only* thing it needs to hold a reference to! Which is much more
sensible than keeping a huge chunk of UAPI-adjacent data structures
alive for a signaled fence that for all we know might stick around
forever attached to some buffer.

Your use case is now completely different to that and this won't work
any more.

This here might just be the first case where that breaks.

This bug already exists, it's just a lot rarer for existing use
cases... but either way Xe is doing the same thing I am, so I'm not
the only one here either.

No it doesn't. You just have implemented the references differently than
they are supposed to be.

Fixing this one occasion here would mitigate that immediate ops, but
doesn't fix the fundamental problem.

Honestly, at this point I'm starting to doubt whether we want to use
drm_scheduler at all, because it clearly wasn't designed for our use
case and every time I try to fix something your answer has been "you're
using it wrong", and at the same time the practically nonexistent
documentation makes it impossible to know how it was actually designed
to be used.

Also, requiring that the hw_fence keep the scheduler alive (which is 
documented nowhere) is a completely ridiculous lifetime requirement and 
layering violation across multiple subsystems. I have no idea how I'd 
make Rust abstractions uphold that requirement safely without doing 
something silly like having abstraction-specific hw_fence wrappers, and 
then you run into deadlocks due to the scheduler potentially being 
dropped from the job_done callback when the fence reference is dropped 
and just... no, please. This is terrible. If you don't want me to fix it 
we'll have to find another way because I can't work with this.

~~ Lina