On Tue, Jan 14, 2025 at 05:33:45PM +0100, Danilo Krummrich wrote: > > > +impl<T> Drop for Devres<T> { > > > + fn drop(&mut self) { > > > + // Revoke the data, such that it gets dropped already and the actual resource is freed. > > > + // > > > + // `DevresInner` has to stay alive until the devres callback has been called. This is > > > + // necessary since we don't know when `Devres` is dropped and calling > > > + // `devm_remove_action()` instead could race with `devres_release_all()`. > > > > IIUC, the outcome of that race is the `WARN` if > > devres_release_all takes the spinlock first and has already remvoed the > > action? > > > > Could you do a custom devres_release here that mimick > > `devm_remove_action` but omit the `WARN`? This way it allows the memory > > behind DevresInner to be freed early without keeping it allocated until > > the end of device lifetime. > > Better late than never, I now remember what's the *actual* race I was preventing > here: > > | Task 0 | Task 1 > --|---------------------------------------------------------------------------- > 1 | Devres::drop() { | devres_release_all() { > 2 | DevresInner::remove_action() { | spin_lock_irqsave(); > 3 | | remove_nodes(&todo); > 4 | | spin_unlock_irqrestore(); > 5 | devm_remove_action_nowarn(); | > 6 | let _ = Arc::from_raw(); | > 7 | } | > 8 | } | > 9 | | release_nodes(&todo); > 10| | } > > So, if devres_release_all() wins the race it stores the devres action within the > temporary todo list. Which means that in 9 we enter > DevresInner::devres_callback() even though DevresInner has been freed already. > > Unfortunately, this race can happen with [1], but not with this original version > of Devres. Well, actually this can't happen, since obviously we know whether Devres:drop removed it or whether it will be released by devres_release_all() and we only free DevresInner conditionally. So, the code in [1] is fine; I somehow managed to confuse myself. Sorry for the noise. - Danilo > > I see two ways to fix it: > > 1) Just revert [1] and stick with the original version. > > 2) Use devm_release_action() instead of devm_remove_action() and don't drop the > reference in DevresInner::remove_action() (6). This way the reference is > always dropped from the callback. > > With 2) we still have an unnecessary call to revoke() if Devres is dropped > before the device is detached from the driver, but that's still better than > keeping DevresInner alive until the device is detached from the driver, which > the original version did. Hence, I'll go ahead and prepare a patch for this, > unless there are any concerns. > > - Danilo > > [1] https://lore.kernel.org/rust-for-linux/20250107122609.8135-2-dakr@xxxxxxxxxx/