Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

"Alexandre Courbot" <acourbot@xxxxxxxxxx> · Wed, 26 Feb 2025 00:23:40 +0900

On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich wrote:
> On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
>> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
>> > CC: Gary
>> >
>> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
>> >> This inability to sleep while we are accessing registers seems very
>> >> constraining to me, if not dangerous. It is pretty common to have
>> >> functions intermingle hardware accesses with other operations that might
>> >> sleep, and this constraint means that in such cases the caller would
>> >> need to perform guard lifetime management manually:
>> >> 
>> >>   let bar_guard = bar.try_access()?;
>> >>   /* do something non-sleeping with bar_guard */
>> >>   drop(bar_guard);
>> >> 
>> >>   /* do something that might sleep */
>> >> 
>> >>   let bar_guard = bar.try_access()?;
>> >>   /* do something non-sleeping with bar_guard */
>> >>   drop(bar_guard);
>> >> 
>> >>   ...
>> >> 
>> >> Failure to drop the guard potentially introduces a race condition, which
>> >> will receive no compile-time warning and potentialy not even a runtime
>> >> one unless lockdep is enabled. This problem does not exist with the
>> >> equivalent C code AFAICT, which makes the Rust version actually more
>> >> error-prone and dangerous, the opposite of what we are trying to achieve
>> >> with Rust. Or am I missing something?
>> >
>> > Generally you are right, but you have to see it from a different perspective.
>> >
>> > What you describe is not an issue that comes from the design of the API, but is
>> > a limitation of Rust in the kernel. People are aware of the issue and with klint
>> > [1] there are solutions for that in the pipeline, see also [2] and [3].
>> >
>> > [1] https://rust-for-linux.com/klint
>> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
>> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
>> 
>> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
>> its own admission. But even if the ignore the safety issue, the other one
>> (ergonomics) is still there.
>> 
>> Basically this way of accessing registers imposes quite a mental burden on its
>> users. It requires a very different (and harsher) discipline than when writing
>> the same code in C
>
> We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
> drm_dev_unplug().

Granted, but the use of these is much more coarsed-grained than what is
expected of IO resources, right?

>
>> and the correct granularity to use is unclear to me.
>> 
>> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
>> particular register in a busy loop, should I call try_access() once before the
>> loop? Or every time before accessing the register?
>
> I think we should re-acquire the guard in each iteration and drop it before the
> delay. I think a simple closure would work very well for this pattern?
>
>> I'm afraid having to check
>> that the resource is still alive before accessing any register is going to
>> become tedious very quickly.
>> 
>> I understand that we want to protect against accessing the IO region of an
>> unplugged device ; but still there is no guarantee that the device won't be
>> unplugged in the middle of a critical section, however short. Thus the driver
>> code should be able to recognize that the device has fallen off the bus when it
>> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
>> extra protection that AFAICT isn't used in C?
>
> As mentioned above, we already do similar things in C.
>
> Also, think about what's the alternative. If we remove the Devres wrapper of
> pci::Bar, we lose the control over the lifetime of the bar mapping and it can
> easily out-live the device / driver binding. This makes the API unsound.

Oh my issue is not with the Devres wrapper, I think it makes sense -
it's more the use of RCU to control access to the resource that I find
too constraining. And I'm pretty sure there will be more users of the
same opinion as more drivers using it get written.

>
> With this drivers would be able to keep resources acquired. What if after a
> hotplug the physical address region is re-used and to be mapped by another
> driver?

Actually - wouldn't that issue also be addressed by a PCI equivalent to
drm_dev_enter() and friends that ensures the device (and thus its
devres resources) stay in place?

Using Rust, I can imagine (but not picture precisely yet) some method of
the device that returns a reference to an inner structure containing its
resources, available with immediate access. Since it would be
coarser-grained, it could rely on something less constraining than RCU
without a noticeable performance penalty.