Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 28, 2025 at 02:55:34PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 11:40:53PM +0100, Danilo Krummrich wrote:
> > On Thu, Feb 27, 2025 at 06:00:13PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 27, 2025 at 01:25:10PM -0800, Boqun Feng wrote:
> > > > 
> > > > Most of the cases, it should be naturally achieved, because you already
> > > > bind the objects into your module or driver, otherwise they would be
> > > > already cancelled and freed. 
> > > 
> > > I'm getting the feeling you can probably naturally achieve the
> > > required destructors, but I think Danillo is concerned that since it
> > > isn't *mandatory* it isn't safe/sound.
> > 
> > Of course you can "naturally" achieve the required destructors, I even explained
> > that in [1]. :-)
> > 
> > And yes, for *device resources* it is unsound if we do not ensure that the
> > device resource is actually dropped at device unbind.
> 
> Why not do a runtime validation then?
> 
> It would be easy to have an atomic counting how many devres objects
> are still alive.

(1) It would not be easy at all, if not impossible.

A Devres object doesn't know whether it's embedded in an Arc<Devres>, nor does
it know whether it is embedded in subsequent Arc containers, e.g.
Arc<Arc<Devres>>.

It is impossible for a Devres object to have a global view on how many
references keep it alive.

> 
> Have remove() WARN_ON to the atomic and a dumb sleep loop until it is 0.
> 
> Properly written drives never hit it. Buggy drivers will throw a
> warning and otherwise function safely.

Ignoring (1), I think that's exactly the opposite of what we want to achieve.

This would mean that the Rust abstraction does *not avoid* but *only detect* the
problem.

The formal problem: The resulting API would be unsound by definition.

The practical problem: Buggy drivers could (as you propose) stall the
corresponding task forever, never releasing the device resource. Not releasing
the device resource may stall subsequent drivers trying to probe the device, or,
if the physical memory region has been reassigned to another device, prevent
another device from probing. This is *not* what I would call "function safely".

With the current API nothing of that kind is possible at all. And that is what
we want to achieve as good as possible: Make Rust driver APIs robust enough,
such that even buggy drivers can't mess up the whole kernel. Especially for a
monolithic kernel this seems quite desirable.

- Danilo



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux