On Tue, Feb 25, 2025 at 05:09:35PM +0100, Danilo Krummrich wrote: > On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote: > > > > > > On 2/24/2025 6:44 PM, Danilo Krummrich wrote: > > > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote: > > >> Hi Danilo, > > >> > > >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote: > > >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote: > > >>>> CC: Gary > > >>>> > > >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote: > > >>>>> This inability to sleep while we are accessing registers seems very > > >>>>> constraining to me, if not dangerous. It is pretty common to have > > >>>>> functions intermingle hardware accesses with other operations that might > > >>>>> sleep, and this constraint means that in such cases the caller would > > >>>>> need to perform guard lifetime management manually: > > >>>>> > > >>>>> let bar_guard = bar.try_access()?; > > >>>>> /* do something non-sleeping with bar_guard */ > > >>>>> drop(bar_guard); > > >>>>> > > >>>>> /* do something that might sleep */ > > >>>>> > > >>>>> let bar_guard = bar.try_access()?; > > >>>>> /* do something non-sleeping with bar_guard */ > > >>>>> drop(bar_guard); > > >>>>> > > >>>>> ... > > >>>>> > > >>>>> Failure to drop the guard potentially introduces a race condition, which > > >>>>> will receive no compile-time warning and potentialy not even a runtime > > >>>>> one unless lockdep is enabled. This problem does not exist with the > > >>>>> equivalent C code AFAICT > > >>> > > >>> Without klint [1] it is exactly the same as in C, where I have to remember to > > >>> not call into something that might sleep from atomic context. > > >>> > > >> > > >> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to > > >> not sleeping? > > > > > > It's not that MMIO needs to be constrained to not sleeping in Rust either. It's > > > just that the synchronization mechanism (RCU) used for the Revocable type > > > implies that. > > > > > > In C we have something that is pretty similar with drm_dev_enter() / > > > drm_dev_exit() even though it is using SRCU instead and is specialized to DRM. > > > > > > In DRM this is used to prevent accesses to device resources after the device has > > > been unplugged. > > > > Thanks a lot for the response. Might it make more sense to use SRCU then? The > > use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing. > > Allowing to hold on to the guard for too long is a bit contradictive to the goal > of detecting hotunplug I guess. > > Besides that I don't really see why we can't just re-acquire it after we sleep? > Rust provides good options to implement it ergonimcally I think. > > > > > Another colleague told me RDMA also uses SRCU for a similar purpose as well. > > See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA? > > [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/ Hmm, so you're saying SRCU sections blocking indefinitely is a concern as per that thread. But I think SRCU GPs should not be stalled in normal operation. If it is, that is a bug anyway. Stalling SRCU grace periods is not really a good thing anyway, you could run out of memory (even though stalling RCU is even more dangerous). For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct me if I'm wrong about the RDMA user but this is what I recollect discussing with you. > > > > >> I am fairly new to rust, could you help elaborate more about why these MMIO > > >> accesses need to have RevocableGuard in Rust? What problem are we trying to > > >> solve that C has but Rust doesn't with the aid of a RCU read-side section? I > > >> vaguely understand we are trying to "wait for an MMIO access" using > > >> synchronize here, but it is just a guest. > > > > > > Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses > > > to unplugged devices. > > > > > > The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the > > > pci::Bar is placed in a Revocable. The Revocable is revoked when the device > > > is detached from the driver (for instance because it has been unplugged). > > > > I guess the Devres concept of revoking resources on driver detach is not a rust > > thing (even for PCI)... but correct me if I'm wrong. > > I'm not sure what you mean with that, can you expand a bit? I was reading the devres documentation earlier. It mentios that one of its use is to clean up resources. Maybe I mixed up the meaning of "clean up" and "revoke" as I was reading it. Honestly, I am still confused a bit by the difference between "revoking" and "cleaning up". > > > > > By revoking the Revocable, the pci::Bar is dropped, which implies that it's also > > > unmapped; a subsequent call to try_access() would fail. > > > > > > But yes, if the device is unplugged while holding the RCU guard, one is on their > > > own; that's also why keeping the critical sections short is desirable. > > > > I have heard some concern around whether Rust is changing the driver model when > > it comes to driver detach / driver remove. Can you elaborate may be a bit about > > how Rust changes that mechanism versus C, when it comes to that? > > I think that one is simple, Rust does *not* change the driver model. > > What makes you think so? Well, the revocable concept for one is rust-only right? It is also possibly just some paranoia based on discussions, but I'm not sure at the moment. > > Ideally we > > would not want Rust drivers to have races with user space accesses when they are > > detached/remove. But we also don't want accesses to be non-sleepable sections > > where this guard is held, it seems restrictive (though to your point the > > sections are expected to be small). > > In the very extreme case, nothing prevents you from implementing a wrapper like: > > fn my_write32(bar: &Devres<pci::Bar>, offset: usize) -> Result<u32> { > let bar = bar.try_access()?; > bar.read32(offset); > } > > Which limits the RCU read side critical section to my_write32(). > > Similarly you can have custom functions for short sequences of I/O ops, or use > closures. I don't understand the concern. Yeah, this is certainly possible. I think one concern is similar to what you raised on the other thread you shared [1]: "Maybe we even want to replace it with SRCU entirely to ensure that drivers can't stall the RCU grace period for too long by accident." [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/ thanks, - Joel