Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

Simona Vetter <simona.vetter@xxxxxxxx> · Tue, 4 Mar 2025 17:10:45 +0100

On Fri, Feb 28, 2025 at 02:40:13PM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote:
> 
> > - Nuke the driver binding manually through sysfs with the unbind files.
> > - Nuke all userspace that might beholding files and other resources open.
> > - At this point the module refcount should be zero and you can unload it.
> > 
> > Except developers really don't like the manual unbind step, and so we're
> > missing try_module_get() in a bunch of places where it really should be.
> 
> IMHO they are not missing, we just have a general rule that if a
> cleanup function, required to be called prior to module exit, revokes
> any .text pointers then you don't need to hold the module refcount.
> 
> file_operations doesn't have such a cleanup function which is why it
> takes the refcount.
> 
> hrtimer does have such a function which is why it doesn't take the
> refcount.

I was talking about a bunch of other places, where it works like
file_operations, except we don't bother with the module reference count.
I've seen patches fly by where people "fix" these things because module
unload is "broken".

> > Now wrt why you can't just solve this all at the subsystem level and
> > guarantee that after drm_dev_unplug no code is running in driver callbacks
> > anymore:
> > 
> > In really, really simple subsystems like backlight this is doable. In drm
> > with arbitrary ioctl this isn't, and you get to make a choice:
> 
> It is certainly doable, you list the right way to do it right below
> and RDMA implements that successfully.
> 
> The subsytem owns all FDs and proxies all file_opertions to the driver
> (after improving them :) and that is protected by a rwsem/SRCU that
> is safe against the removal path setting all driver ops to NULL.

I'm not saying that any of these approaches are bad. For some cases we
plan to use them in gpu code too even. The above is pretty much the plan
we have for dma_fence.

> > - You wait until all driver code finishes, which could be never if there's
> >   ioctl that wait for render to complete and don't handle hotunplug
> >   correctly. This is a deadlock.
> 
> Meh. We waited for all FDs to close for along time. It isn't a
> "deadlock" it is just a wait on userspace that extends to module
> unload. Very undesirable yes, but not the end of the world, it can
> resolve itself if userspace shutsdown.
> 
> But, IMHO, the subsystem and driver should shoot down the waits during
> remove.
> 
> Your infinite waits are all interruptable right? :)

So yeah userspace you can shoot down with SIGKILL, assuming really good
programming. But there's also all the in-kernel operations between various
work queues and other threads. This can be easily fixed by just rewriting
the entire thing into a strict message passing paradigm. Unfortunately
rust has to interop with the current existing mess.

gpu drivers can hog console_lock (yes we're trying to get away from that
as much as possible), at that point a cavalier attitude of "you can just
wait" isn't very appreciated.

And once you've made sure that really everything can bail out of you've
gotten pretty close to reimplementing revocable resources.

> >   In my experience this is theorically possible, practically no one gets
> >   this right and defacto means that actual hotunplug under load has a good
> >   chance of just hanging forever. Which is why drm doesn't do this.
> 
> See, we didn't have this problem as we don't have infinite waits in
> driver as part of the API. The API toward the driver is event driven..

Yeah rolling everything over to event passing and message queues would
sort this out a lot. It's kinda not where we are though.

> I can understand that adding the shootdown logic all over the place
> would be hard and you'd get it wrong.
> 
> But so is half removing the driver while it is doing *anything* and
> trying to mitigate that with a different kind of hard to do locking
> fix. *shrug*

The thing is that rust helps you enormously with implementing revocable
resources and making sure you're not cheating with all the bail-out paths.

It cannot help you with making sure you have interruptible/abortable
sleeps in all the right places. Yes this is a bit a disappointment, but
fundamentally rust cannot model negative contexts (unlike strictly
functional languages like haskell) where certain operations are not
allowed. But it is much, much better than C at "this could fail, you must
handle it and not screw up".

In some cases you can plug this gap with runtime validation, like fake
lockdep contexts behind the might_alloc_gfp() checks and similar tricks
we're using on the C side too. Given that I'm still struggling with
weeding out design deadlocks at normal operations. For example runtime pm
is an absolute disaster on this, and a lot of drivers fail real bad once
you add lockdep annotations for runtime pm. I'll probably retire before I
get to doing this for driver unload.

> >   This is why I like the rust Revocable so much, because it's a normal rcu
> >   section, so disallows all sleeping. You might still deadlock on a busy
> >   loop waiting for hw without having a timeout. But that's generally
> >   fairly easy to spot, and good drivers have macros/helpers for this so
> >   that there is always a timeout.
> 
> The Recovable version narrows the critical sections to very small
> regions, but having critical sections at all is still, IMHO, hacky.
> 
> What you should ask Rust to solve for you is the infinite waits! That
> is the root cause of your problem. Compiler enforces no waits with out
> a revocation option on DRM callbacks!
> 
> Wouldn't that be much better??

It would indeed be nice. I haven't seen that rust unicorn yet though, and
from my understanding it's just not something rust can give you. Rust
isn't magic, it's just a tool that can do a few fairly specific things a
lot better than C. But otherwise it's still the same mess.

> >   drm_dev_unplug uses sleepable rcu for practicality reasons and so has a
> >   much, much higher chance of deadlocks. Note that strictly speaking
> >   drm_device should hold a module reference on the driver, but see above
> >   for why we don't have that - developers prefer convenience over
> >   correctness in this area.
> 
> Doesn't DRM have a module reference because the fops is in the driver
> and the file core takes the driver module reference during
> fops_get()/replace_fops() in drm_stub_open()? Or do I misunderstand
> what that stub is for?
> 
> Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ?

Yeah it's there, except only for the userspace references and not for the
kernel internal ones. Because developers get a bit prickle about adding
those unfortunately due to "it breaks module unload". Maybe we just should
add them, at least for rust.

> > We can and should definitely try to make this much better. I think we can
> > get to full correctness wrt the first 3 lifetime things in rust. I'm not
> > sure whether handling module unload/.text lifetime is worth the bother,
> > it's probably only going to upset developers if we try. 
> 
> It hurts to read a suggestion we should ignore .text lifetime rules :(
> DRM can be be like this, but please don't push that mess onto the rest
> of the world in the common rust bindings or common rust design
> patterns. Especially after places have invested alot to properly and
> fully fix these problems without EAF bugs, infinite wait problems or
> otherwise.
> 
> My suggestion is that new DRM rust drivers should have the file
> operations isolation like RDMA does and a design goal to have
> revocable sleeps. No EAF issue. You don't have to fix the whole DRM
> subsystem to get here, just some fairly small work that only new rust
> drivers would use. Start off on a good foot. <shrug>

You've missed the "it will upset developers part". I've seen people remove
module references that are needed, to "fix" driver unloading.

The other part is that rust isn't magic, the compiler cannot reasons
through every possible correct api. Which means that sometimes it forces a
failure path on you that you know cannot ever happen, but you cannot teach
the compiler how to prove that. You can side-step that by runtime death in
rust aka BUG_ON(). Which isn't popular really either.

The third part is that I'm not aware of anything in rust that would
guarantee that the function pointer and the module reference actually
belong to each another. Which means another runtime check most likely, and
hence another thing that shouldn't fail which kinda can now.

Hence my conclusion that maybe it's just not the top priority to get this
all perfect.

Cheers, Sima

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch