Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 27, 2025 at 11:07:09AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 12:32:45PM +0100, Danilo Krummrich wrote:
> > On Wed, Feb 26, 2025 at 07:47:30PM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 26, 2025 at 10:31:10PM +0100, Danilo Krummrich wrote:
> > > > Let's take a step back and look again why we have Devres (and Revocable) for
> > > > e.g. pci::Bar.
> > > > 
> > > > The device / driver model requires that device resources are only held by a
> > > > driver, as long as the driver is bound to the device.
> > > > 
> > > > For instance, in C we achieve this by calling
> > > > 
> > > > 	pci_iounmap()
> > > > 	pci_release_region()
> > > > 
> > > > from remove().
> > > > 
> > > > We rely on this, we trust drivers to actually do this.
> > > 
> > > Right, exactly
> > > 
> > > But it is not just PCI bar. There are a *huge* number of kernel APIs
> > > that have built in to them the same sort of requirement - teardown
> > > MUST run with remove, and once done the resource cannot be used by
> > > another thread.
> > > 
> > > Basically most things involving function pointers has this sort of
> > > lifecycle requirement because it is a common process that prevents a
> > > EAF of module unload.
> > 
> > You're still mixing topics, the whole Devres<pci::Bar> thing as about limiting
> > object lifetime to the point where the driver is unbound.
> > 
> > Shutting down asynchronous execution of things, i.e. workqueues, timers, IOCTLs
> > to prevent unexpected access to the module .text section is a whole different
> > topic.
> 
> Again, the standard kernel design pattern is to put these things
> together so that shutdown isolates concurrency which permits free
> without UAF.
> 
> > In other words, assuming that we properly enforce that there are no async
> > execution paths after remove() or module_exit() (not necessarily the same),
> > we still need to ensure that a pci::Bar object does not outlive remove().
> 
> Yes, you just have to somehow use rust to ensure a call pci_iounmap()
> happens during remove, after the isolation.
> 
> You are already doing it with devm.  It seems to me the only problem
> you have is nobody has invented a way in rust to contract that the devm
> won't run until the threads are isolated.

You can do that, pci::Driver::probe() returns a Pin<KBox<Self>>. This object is
dropped when the device is unbound and it runs before the devres callbacks.

Using miscdevice as example, your MiscDeviceRegistration would be a member of
this object and hence dropped on remove() before the devres callbacks revoke
device resources.

> 
> I don't see this as insolvable, you can have some input argument to
> any API that creates concurrency that also pushes an ordered
> destructor to the struct device lifecycle that ensures it cancels that
> concurrency.
> 
> > Device resources are a bit special, since their lifetime must be cap'd at device
> > unbind, *independent* of the object lifetime they reside in. Hence the Devres
> > container.
> 
> I'd argue many resources should be limited to device unbind. Memory is
> perhaps the only exception.

There is a difference between should and must. A driver is fully free to bind
the lifetime of a miscdevice to either to the driver lifetime (object returned
by probe) or the module lifetime, both can be valid. That's a question of
semantics.

A device resource though is only allowed to be held by a driver as long as the
corresponding device is bound to the driver. Hence an API that does not ensure
that the pci::Bar is actually, forcefully dropped on device unbind is unsound.

So, let me ask you again, how do you ensure that a pci::Bar is dropped on device
unbind if we hand it out without the Devres container?

> 
> > > My fear, that is intensifying as we go through this discussion, is
> > > that rust binding authors have not fully comprehended what the kernel
> > > life cycle model and common design pattern actually is, and have not
> > > fully thought through issues like module unload creating a lifetime
> > > cycle for *function pointers*.
> > 
> > I do *not* see where you take the evidence from to make such a generic
> > statement.
> 
> Well, I take the basic insistance that is OK to leak stuff from driver
> scope to module scope is not well designed.
> 

Who insists on leaking stuff from driver scope to module scope is OK?



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux