Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Christian König <christian.koenig@xxxxxxx> · Mon, 31 Jul 2023 10:09:00 +0200

Am 21.07.23 um 12:33 schrieb Asahi Lina:
[SNIP]

I've already tried to explain the issue. The DRM scheduler design, as 
it stands today, makes it impractical to write a safe Rust abstraction 
for it. This is a fact. Christian has repeatedly NAKed my attempts at 
changing it to make such a safe abstraction possible, and is clearly 
opposed to the fundamental lifetime requirements change I am trying to 
make here. Therefore, we are at an impasse.

It's unfortunate, but given this situation, the DRM scheduler will not 
be available to Rust DRM drivers. I thought the goal of the DRM 
subsystem common code was to cater to multiple drivers and usage 
approaches, with an emphasis on doing things "right" and avoiding 
design issues that are common mistakes in driver design. Clearly, this 
is not the case for all of DRM, at least not the DRM scheduler.

In software engineering, complex lifetime requirements are an 
anti-pattern, which is one reason why Rust draws a line between safe 
and unsafe APIs. For a safe API, it is up to the API developer to 
design it such that it cannot be misused in a way that causes memory 
safety issues, and the only lifetime requirements it can impose are 
those that can be expressed in the Rust type system and statically 
checked at compile time. The DRM scheduler's complex chain of lifetime 
requirements cannot, and wrapping it in enough glue to remove those 
lifetime requirements would require about as much code as just 
rewriting it, as well as add unacceptable duplication and overhead.

In kernel Rust, we strive to only have safe APIs for components which 
have no fundamental reason for being unsafe (things like direct memory 
mapping and raw hardware access). The DRM scheduler does not fall into 
one of those "inherently unsafe" categories, so it doesn't make sense 
to provide a raw unsafe API for it.

This is not completely correct. The DRM scheduler provides a dma_fence 
interface as wrapper around hardware dma_fence interfaces.

This interface is made to allow core Linux memory management to query 
the progress of hardware operations.

So you are working with something very low level here and have to follow 
restrictions which Rust can't enforce from the language because it 
simply can't know about that at compile time.

Doing so would just expose Rust code to the kind of subtle safety 
implications that cause memory problems every day in C. If I were to 
use drm_sched's unsafe API "as is" in my driver, it would *by far* be 
the least auditable, most complex usage of unsafe code in the entire 
driver, and I have no confidence that I would be able to get it right 
and keep it right as the driver changes.

I don't see a reason why this cannot be simply fixed in drm_sched, but 
Christian disagrees, and has repeatedly (and strongly) NAKed my 
attempts at doing so, and indeed NAKed the entire premise of the 
change in lifetime requirements itself. So here we are. There is no 
solution that will work for everyone that involves drm_sched.

So I don't have a choice other than to just not use drm_sched and roll 
my own. I will happily move that Rust implementation to common code if 
another Rust driver comes along and wants to use it. I'm not sure if 
that kind of thing can be easily made available to C drivers in 
reverse, but I guess we'll cross that bridge when a C driver expresses 
interest in using it.

Well, to make it clear once more: Signaling a dma_fence from the 
destructor of a reference counted object is very problematic! This will 
be rejected no matter if you do that in C or in Rust.

What we can do is to make it safe in the sense that you don't access 
freed up memory by using the scheduler fences even more as wrapper 
around the hardware fence as we do now. But this quite a change and 
requires a bit more than just hacking around 
drm_sched_fence_get_timeline_name().

So far it seems existing C drivers are happy with drm_sched's design 
and complex lifetime requirements, even though they aren't even 
documented. Perhaps they managed to reverse engineer them from the 
source, or someone told the authors about it (certainly nobody told me 
when I started using drm_sched). Or maybe a bunch of the drm_scheduler 
users are actually broken and have memory safety issues in corner 
cases. I don't know, though if I had to bet, I'd bet on the second 
option.

Actually, let's do a quick search and find out!

panfrost_remove() -> panfrost_device_fini() -> panfrost_job_fini() -> 
drm_sched_fini()

There is a direct, synchronous path between device removal and 
destroying the DRM scheduler. At no point does it wait for any jobs to 
complete. That's what patch #3 in this series tries to fix.

In fact, it doesn't even keep the entities alive! It calls 
drm_dev_unregister() before everything else, but as the docs for the 
DRM driver lifetimes clearly say (see, docs!), objects visible to 
userspace must survive that and only be released from the release 
callback. drm_sched entities are created/destroyed from 
panfrost_job_open()/panfrost_job_close(), which are called from 
panfrost_open() and panfrost_postclose(), which are the userspace file 
open/close functions.

That one I fix in the Rust abstraction already (that one's relatively 
easy to fix), so it doesn't need a drm_sched patch from my point of 
view, but it is yet another subtle, undocumented lifetime requirement 
that is, evidently, impossible to know about and get right without 
documentation.

Meanwhile, panfrost_fence_ops has no remove() callback, which means 
there is no reference path stopping device removal (and therefore 
scheduler teardown) or even module unload while driver fences are 
still alive. That leads to the UAF patch #2 in this series tries to fix.

In other words, as far as I can tell, the panfrost driver gets 
*everything* wrong when it comes to the DRM scheduler lifetime 
requirements, and will crash and burn if the driver is unbound while 
jobs are in flight, anyone has a file descriptor open at all, or even 
if any shared buffer holding a DRM scheduler fence from it is bound to 
the display controller at that time.

Yeah, that is perfectly correct what you wrote.

Daniel and I have gone back and forth multiple times how we should 
design this and we opted to not use direct pointers for the contexts 
because that allows for simpler driver implementations.

The DRM scheduler doesn't document the lifetime requirements because it 
doesn't define the lifetime requirements. From the design the DRM 
scheduler is supposed to be an component wrapping around DMA fences. And 
those DMA fences have the necessary lifetime definition.

Now DMA fences have their live cycles explained in the structure 
documentation, but it doesn't really talk much about the requirements 
for dma_fence_ops implementations. We should probably improve that.

So yes, drivers need to keep the structures which might be accessed by 
userspace alive even after the underlying device is removed. But 
signaling dma_fences is completely independent from that.

This is why this kind of design is not allowed in Rust.

Well it is allowed, it's just not safe.

Regards,
Christian.

Because nobody gets it right. *Especially* not without docs. I 
assumed, like the authors of the Panfrost driver clearly assumed, that 
the DRM scheduler API would not have these crazy undocumented hidden 
requirements. The only reason I found out the hard way is I happen to 
create and destroy schedulers all the time, not just once globally, so 
I would hit the bugs and dangling pointers much more often than 
Panfrost users who, most likely, never unbind their devices. But we 
both have the same problem.

I think I've done all I can to explain the issues and try to fix them, 
so the ball is in your court now. If you want to keep the current 
design, that's fine, but Rust code will not be a drm_sched user in 
that case. And the rest of the DRM folks in the C world will have to 
contend with these issues and fix all the problems in the C drivers 
(I'm sure panfrost isn't the only one, it's just literally the first 
one I looked at).

As for me, I'm happy to write a simple workqueue-based Rust scheduler 
suitable for firmware-managed scheduler devices. Honestly, at this 
point, I have very little faith left in my ability to fix all these 
issues in drm_sched (I know there's at least one more lurking, I saw a 
NULL deref but I wasn't able to reproduce it nor divine how it 
possibly happened). That, combined with the hostility from the AMD 
folks about my attempts to improve drm_sched even just a little bit, 
makes that decision very easy.

Farewell, DRM scheduler. It was nice trying to work with you, but 
things just didn't work out. I won't be submitting a v2 to this series 
myself. Please ping me if you fix all these fundamental design issues 
with drm_sched and think I might actually be able to use it safely in 
Rust one day. If the API design is solid and safe and the 
implementation done in a way that inspires confidence at that time 
maybe we can yank out my Rust solution when the time comes and switch 
back to drm_sched.

Just please don't expect me to do the work any more, I've done 
everything I can and this now has to come from you, not me. I've spent 
way more time understanding drm_sched, auditing its code, 
understanding its design intent, trying to fix it, and getting yelled 
at for it than it would take to write a new, clean, safe Rust 
scheduler. I don't regret some of the time spent (some of the 
implementation details of drm_sched have taught me useful things), but 
I'm not prepared to spend any more, sorry.

~~ Lina