Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

Danilo Krummrich <dakr@xxxxxxxxxx> · Tue, 25 Feb 2025 17:09:35 +0100

On Tue, Feb 25, 2025 at 10:52:41AM -0500, Joel Fernandes wrote:
> 
> 
> On 2/24/2025 6:44 PM, Danilo Krummrich wrote:
> > On Mon, Feb 24, 2025 at 01:45:02PM -0500, Joel Fernandes wrote:
> >> Hi Danilo,
> >>
> >> On Mon, Feb 24, 2025 at 01:11:17PM +0100, Danilo Krummrich wrote:
> >>> On Mon, Feb 24, 2025 at 01:07:19PM +0100, Danilo Krummrich wrote:
> >>>> CC: Gary
> >>>>
> >>>> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >>>>> This inability to sleep while we are accessing registers seems very
> >>>>> constraining to me, if not dangerous. It is pretty common to have
> >>>>> functions intermingle hardware accesses with other operations that might
> >>>>> sleep, and this constraint means that in such cases the caller would
> >>>>> need to perform guard lifetime management manually:
> >>>>>
> >>>>>   let bar_guard = bar.try_access()?;
> >>>>>   /* do something non-sleeping with bar_guard */
> >>>>>   drop(bar_guard);
> >>>>>
> >>>>>   /* do something that might sleep */
> >>>>>
> >>>>>   let bar_guard = bar.try_access()?;
> >>>>>   /* do something non-sleeping with bar_guard */
> >>>>>   drop(bar_guard);
> >>>>>
> >>>>>   ...
> >>>>>
> >>>>> Failure to drop the guard potentially introduces a race condition, which
> >>>>> will receive no compile-time warning and potentialy not even a runtime
> >>>>> one unless lockdep is enabled. This problem does not exist with the
> >>>>> equivalent C code AFAICT
> >>>
> >>> Without klint [1] it is exactly the same as in C, where I have to remember to
> >>> not call into something that might sleep from atomic context.
> >>>
> >>
> >> Sure, but in C, a sequence of MMIO accesses don't need to be constrained to
> >> not sleeping?
> > 
> > It's not that MMIO needs to be constrained to not sleeping in Rust either. It's
> > just that the synchronization mechanism (RCU) used for the Revocable type
> > implies that.
> > 
> > In C we have something that is pretty similar with drm_dev_enter() /
> > drm_dev_exit() even though it is using SRCU instead and is specialized to DRM.
> > 
> > In DRM this is used to prevent accesses to device resources after the device has
> > been unplugged.
> 
> Thanks a lot for the response. Might it make more sense to use SRCU then? The
> use of RCU seems overly restrictive due to the no-sleep-while-guard-held thing.

Allowing to hold on to the guard for too long is a bit contradictive to the goal
of detecting hotunplug I guess.

Besides that I don't really see why we can't just re-acquire it after we sleep?
Rust provides good options to implement it ergonimcally I think.

> 
> Another colleague told me RDMA also uses SRCU for a similar purpose as well.

See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?

[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/

> 
> >> I am fairly new to rust, could you help elaborate more about why these MMIO
> >> accesses need to have RevocableGuard in Rust? What problem are we trying to
> >> solve that C has but Rust doesn't with the aid of a RCU read-side section? I
> >> vaguely understand we are trying to "wait for an MMIO access" using
> >> synchronize here, but it is just a guest.
> > 
> > Similar to the above, in Rust it's a safety constraint to prevent MMIO accesses
> > to unplugged devices.
> > 
> > The exact type in Rust in this case is Devres<pci::Bar>. Within Devres, the
> > pci::Bar is placed in a Revocable. The Revocable is revoked when the device
> > is detached from the driver (for instance because it has been unplugged).
> 
> I guess the Devres concept of revoking resources on driver detach is not a rust
> thing (even for PCI)... but correct me if I'm wrong.

I'm not sure what you mean with that, can you expand a bit?

> 
> > By revoking the Revocable, the pci::Bar is dropped, which implies that it's also
> > unmapped; a subsequent call to try_access() would fail.
> > 
> > But yes, if the device is unplugged while holding the RCU guard, one is on their
> > own; that's also why keeping the critical sections short is desirable.
> 
> I have heard some concern around whether Rust is changing the driver model when
> it comes to driver detach / driver remove.  Can you elaborate may be a bit about
> how Rust changes that mechanism versus C, when it comes to that?

I think that one is simple, Rust does *not* change the driver model.

What makes you think so?

> Ideally we
> would not want Rust drivers to have races with user space accesses when they are
> detached/remove. But we also don't want accesses to be non-sleepable sections
> where this guard is held, it seems restrictive (though to your point the
> sections are expected to be small).

In the very extreme case, nothing prevents you from implementing a wrapper like:

	fn my_write32(bar: &Devres<pci::Bar>, offset: usize) -> Result<u32> {
		let bar = bar.try_access()?;
		bar.read32(offset);
	}

Which limits the RCU read side critical section to my_write32().

Similarly you can have custom functions for short sequences of I/O ops, or use
closures. I don't understand the concern.