On Mon, Mar 03, 2025 at 08:36:34PM +0100, Danilo Krummrich wrote: > > > And yes, for *device resources* it is unsound if we do not ensure that the > > > device resource is actually dropped at device unbind. > > > > Why not do a runtime validation then? > > > > It would be easy to have an atomic counting how many devres objects > > are still alive. > > (1) It would not be easy at all, if not impossible. > > A Devres object doesn't know whether it's embedded in an Arc<Devres>, nor does > it know whether it is embedded in subsequent Arc containers, e.g. > Arc<Arc<Devres>>. You aren't tracking that. If Rust has a problem, implement it in C: devm_rsgc_start(struct device *) devm_rsgc_pci_iomap(struct device *,..) devm_rsgc_pci_iounmap(struct device *,..) devm_rsgc_fence(struct device *) Surely that isn't a problem for bindings? > > Properly written drives never hit it. Buggy drivers will throw a > > warning and otherwise function safely. > > Ignoring (1), I think that's exactly the opposite of what we want to achieve. > > This would mean that the Rust abstraction does *not avoid* but *only detect* the > problem. It is preventing memory leaks. We would like to prevent memory leaks in the kernel. Currently you can't even detect them, so this seems like an improvement. I am using the practical kernel definition for "memory leak" of memory outliving remove() rather than the more philisophical rust version that memory is someday, eventually, freed. In an environment like the kernel the distinction is important and desirable. > The practical problem: Buggy drivers could (as you propose) stall the > corresponding task forever, never releasing the device resource. Should't be never. Rust never leaks memory? It will eventually be unstuck when Rust frees the memory in the concurrent context that is holding it. > Not releasing the device resource may stall subsequent drivers > trying to probe the device, or, if the physical memory region has > been reassigned to another device, prevent another device from > probing. This is *not* what I would call "function safely". It didn't create a security problem. The driver could also write while (true) sleep(1) In the remove function and achieve all of the above bad things, should you try to statically prevent that too? I've written stuff like the above in drivers before, it is an easy way to make thing safe when working with FDs, so don't think nobody would do that. So I don't see alot of value in arguing you can close of 1 way to achieve something bad and still leave open all sorts of others. At least is more philisophical than I prefer to get. It is better to focus on usability, performance, and bug class detection, IMHO. Jason