On Sun, Jul 28, 2024 at 08:04:52PM -0300, Jason Gunthorpe wrote: > On Sun, Jul 28, 2024 at 11:34:14PM +0200, Danilo Krummrich wrote: > > On Sun, Jul 28, 2024 at 03:13:08PM -0300, Jason Gunthorpe wrote: > > > I think we're on the same page with all that. As clarified in [1], that's not a > > big concern, I was referring to the changes required to integrate the auxbus > > stuff. > > Well, I see this thread having the realization that things are not > setup proeprly to use devres. We could use it in Nouveau without issue, it starts to become an issue when moving to the auxbus design because userspace isn't aware. But again, if we figure out that switching to auxbus is worth it, that is totally fine. > To be fair devres creates almost as many > bugs as it solves :\ cleanup.h is possibly a better option for most > simple things and harder to misuse... I agree cleanup.h is useful, it has a whole different purpose though. > > > > normal (though most subsystems would call that unregister, not put) > > > > A DRM device is reference counted and can out-live the driver, hence the > > drm_dev_put() call in .remove(). There is also a special drm_dev_unplug() > > function, which does not only unregister the DRM device, but also sets a guard > > to be able prevent HW accesses after the HW is accessible anymore. > > Every subsystem has a refcounted object, struct device is inherently > refcounted. You call the thing driver calls during .remove() > 'unregister' because it is special. Once it returns the subsystem has > to promise no more code is running in driver callbacks and the driver > is permitted to start destroying anything it might need to use when > processing any callbacks. I'm well aware, that is not the case for DRM though. Again, a DRM device can (and is allowed to) out-live the driver. There can even still be calls into the driver after it has been unregistered. This is where drm_dev_unplug(), drm_dev_enter() and drm_dev_exit() [1] are used to guard the places where device resources, such as MMIO mappings, are accessed. [1] https://elixir.bootlin.com/linux/v6.10/source/drivers/gpu/drm/drm_drv.c#L436 > > This is really tricky and people routinely misunderstand the > requirements and get this wrong. The consequence is UAF problems in > obscure cases with unbind races (that few actually care about), but As for memory safety in this context, DRM has it's own managed memory allocators, which tie memory allocations to the lifetime of a DRM device. This stuff is called the DRM managed API [2]. Some structures have to be allocated with, e.g. drmm_kzalloc(), for this purpose. [2] https://elixir.bootlin.com/linux/v6.10/source/drivers/gpu/drm/drm_managed.c#L21 > getting it right starts with labeling things properly :) As for DRM, I think everything is labeled properly. > > We went through this long ago in RDMA because someone actually had a > usecase of live driver unbind, making that work reliably under a full > active work load took some thoughtfulness. And so did DRM. :) > > Jason >