On Tue, Feb 25, 2025 at 06:57:56PM -0400, Jason Gunthorpe wrote: > I think this resource-revoke idea is deviating from the normal > expected driver model by allowing driver code to continue to run in > other threads once remove completes. That is definitely abnormal at > least. No, it simply guarantees that once remove() completed the pointer to the resource can't be accessed anymore and the resource can't be kept alive (which includes the actual memory mapping as well as the allocated resource region). It also solves the unplug problem, where ioctls can't access the resource anymore after remove(). This is indeed a problem that does not affect all subsystems. > > It is not necessarily *wrong*, but it sure is weird and as I explained > above it has bad system level properties. > > Further, it seems to me there is a very unique DRM specific issue at > work "time unbounded driver callbacks". A weird solution to this > should not be baked into the common core kernel rust bindings and > break the working model of all other subsystems that don't have that > problem. > > > > Similarly you can have custom functions for short sequences of I/O ops, or use > > > closures. I don't understand the concern. > > > > Yeah, this is certainly possible. I think one concern is similar to what you > > raised on the other thread you shared [1]: > > "Maybe we even want to replace it with SRCU entirely to ensure that drivers > > can't stall the RCU grace period for too long by accident." > > I'd be worried about introducing a whole bunch more untestable failure > paths in drivers. Especially in areas like work queue submit that are > designed not to fail [2]. Non-failing work queues is a critical property > that I've relied on countless times. I'm not sure you even *can* recover > from this correctly in all cases. > > Then in the other email did you say that even some memory allocations > go into this scheme? Yikes! "For instance, let's take devm_kzalloc(). Once the device is detached from the driver the memory allocated with this function is freed automatically. The additional step in Rust is, that we'd not only free the memory, but also revoke the access to the pointer that has been returned by devm_kzalloc() for the driver, such that it can't be used by accident anymore." This was just an analogy to explain what we're doing here. Obviously, memory allocations can be managed by Rust's object lifetime management. The reason we have Devres for device resources is that the lifetime of a pci::Bar is *not* bound to the object lifetime directly, but to the lifetime of the binding between a device and a driver. That's why it needs to be revoked (which forcefully drops the object) when the device is unbound *not* when the pci::Bar object is dropped regularly. That's all the magic we're doing here. And again, this is not a change to the device / driver model. It is making use of the device / driver model to ensure safety. > > Further, hiding a synchronize_rcu in a devm destructor [3], once per > revocable object is awful. If you imagine having a rcu around each of > your revocable objects, how many synchronize_rcu()s is devm going to > call post-remove()? As many as you have MMIO mappings in your driver. But we can probably optimize this to just a single synchronize_rcu().