On Fri, Feb 28, 2025 at 02:40:13PM -0400, Jason Gunthorpe wrote: > On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote: > > > - Nuke the driver binding manually through sysfs with the unbind files. > > - Nuke all userspace that might beholding files and other resources open. > > - At this point the module refcount should be zero and you can unload it. > > > > Except developers really don't like the manual unbind step, and so we're > > missing try_module_get() in a bunch of places where it really should be. > > IMHO they are not missing, we just have a general rule that if a > cleanup function, required to be called prior to module exit, revokes > any .text pointers then you don't need to hold the module refcount. > > file_operations doesn't have such a cleanup function which is why it > takes the refcount. > > hrtimer does have such a function which is why it doesn't take the > refcount. I was talking about a bunch of other places, where it works like file_operations, except we don't bother with the module reference count. I've seen patches fly by where people "fix" these things because module unload is "broken". > > Now wrt why you can't just solve this all at the subsystem level and > > guarantee that after drm_dev_unplug no code is running in driver callbacks > > anymore: > > > > In really, really simple subsystems like backlight this is doable. In drm > > with arbitrary ioctl this isn't, and you get to make a choice: > > It is certainly doable, you list the right way to do it right below > and RDMA implements that successfully. > > The subsytem owns all FDs and proxies all file_opertions to the driver > (after improving them :) and that is protected by a rwsem/SRCU that > is safe against the removal path setting all driver ops to NULL. I'm not saying that any of these approaches are bad. For some cases we plan to use them in gpu code too even. The above is pretty much the plan we have for dma_fence. > > - You wait until all driver code finishes, which could be never if there's > > ioctl that wait for render to complete and don't handle hotunplug > > correctly. This is a deadlock. > > Meh. We waited for all FDs to close for along time. It isn't a > "deadlock" it is just a wait on userspace that extends to module > unload. Very undesirable yes, but not the end of the world, it can > resolve itself if userspace shutsdown. > > But, IMHO, the subsystem and driver should shoot down the waits during > remove. > > Your infinite waits are all interruptable right? :) So yeah userspace you can shoot down with SIGKILL, assuming really good programming. But there's also all the in-kernel operations between various work queues and other threads. This can be easily fixed by just rewriting the entire thing into a strict message passing paradigm. Unfortunately rust has to interop with the current existing mess. gpu drivers can hog console_lock (yes we're trying to get away from that as much as possible), at that point a cavalier attitude of "you can just wait" isn't very appreciated. And once you've made sure that really everything can bail out of you've gotten pretty close to reimplementing revocable resources. > > In my experience this is theorically possible, practically no one gets > > this right and defacto means that actual hotunplug under load has a good > > chance of just hanging forever. Which is why drm doesn't do this. > > See, we didn't have this problem as we don't have infinite waits in > driver as part of the API. The API toward the driver is event driven.. Yeah rolling everything over to event passing and message queues would sort this out a lot. It's kinda not where we are though. > I can understand that adding the shootdown logic all over the place > would be hard and you'd get it wrong. > > But so is half removing the driver while it is doing *anything* and > trying to mitigate that with a different kind of hard to do locking > fix. *shrug* The thing is that rust helps you enormously with implementing revocable resources and making sure you're not cheating with all the bail-out paths. It cannot help you with making sure you have interruptible/abortable sleeps in all the right places. Yes this is a bit a disappointment, but fundamentally rust cannot model negative contexts (unlike strictly functional languages like haskell) where certain operations are not allowed. But it is much, much better than C at "this could fail, you must handle it and not screw up". In some cases you can plug this gap with runtime validation, like fake lockdep contexts behind the might_alloc_gfp() checks and similar tricks we're using on the C side too. Given that I'm still struggling with weeding out design deadlocks at normal operations. For example runtime pm is an absolute disaster on this, and a lot of drivers fail real bad once you add lockdep annotations for runtime pm. I'll probably retire before I get to doing this for driver unload. > > This is why I like the rust Revocable so much, because it's a normal rcu > > section, so disallows all sleeping. You might still deadlock on a busy > > loop waiting for hw without having a timeout. But that's generally > > fairly easy to spot, and good drivers have macros/helpers for this so > > that there is always a timeout. > > The Recovable version narrows the critical sections to very small > regions, but having critical sections at all is still, IMHO, hacky. > > What you should ask Rust to solve for you is the infinite waits! That > is the root cause of your problem. Compiler enforces no waits with out > a revocation option on DRM callbacks! > > Wouldn't that be much better?? It would indeed be nice. I haven't seen that rust unicorn yet though, and from my understanding it's just not something rust can give you. Rust isn't magic, it's just a tool that can do a few fairly specific things a lot better than C. But otherwise it's still the same mess. > > drm_dev_unplug uses sleepable rcu for practicality reasons and so has a > > much, much higher chance of deadlocks. Note that strictly speaking > > drm_device should hold a module reference on the driver, but see above > > for why we don't have that - developers prefer convenience over > > correctness in this area. > > Doesn't DRM have a module reference because the fops is in the driver > and the file core takes the driver module reference during > fops_get()/replace_fops() in drm_stub_open()? Or do I misunderstand > what that stub is for? > > Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ? Yeah it's there, except only for the userspace references and not for the kernel internal ones. Because developers get a bit prickle about adding those unfortunately due to "it breaks module unload". Maybe we just should add them, at least for rust. > > We can and should definitely try to make this much better. I think we can > > get to full correctness wrt the first 3 lifetime things in rust. I'm not > > sure whether handling module unload/.text lifetime is worth the bother, > > it's probably only going to upset developers if we try. > > It hurts to read a suggestion we should ignore .text lifetime rules :( > DRM can be be like this, but please don't push that mess onto the rest > of the world in the common rust bindings or common rust design > patterns. Especially after places have invested alot to properly and > fully fix these problems without EAF bugs, infinite wait problems or > otherwise. > > My suggestion is that new DRM rust drivers should have the file > operations isolation like RDMA does and a design goal to have > revocable sleeps. No EAF issue. You don't have to fix the whole DRM > subsystem to get here, just some fairly small work that only new rust > drivers would use. Start off on a good foot. <shrug> You've missed the "it will upset developers part". I've seen people remove module references that are needed, to "fix" driver unloading. The other part is that rust isn't magic, the compiler cannot reasons through every possible correct api. Which means that sometimes it forces a failure path on you that you know cannot ever happen, but you cannot teach the compiler how to prove that. You can side-step that by runtime death in rust aka BUG_ON(). Which isn't popular really either. The third part is that I'm not aware of anything in rust that would guarantee that the function pointer and the module reference actually belong to each another. Which means another runtime check most likely, and hence another thing that shouldn't fail which kinda can now. Hence my conclusion that maybe it's just not the top priority to get this all perfect. Cheers, Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch