Re: devm actions and hw clenaup (was Re: [PATCH 01/11] drm: Add devm_drm_dev_init/register)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 29, 2019 at 6:36 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 29, 2019 at 05:50:05PM +0100, Daniel Vetter wrote:
> > On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> > >
> > >
> > > Den 24.01.2019 18.57, skrev Daniel Vetter:
> > > > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >>
> > > >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> > > >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > > >>>>
> > > >>>>
> > > >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> > > >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@xxxxxxxxxxx> wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > > >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > > >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > > >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > > >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > > >>>>>>>>>>> drm_dev_register().
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > > >>>>>>>>>>> fbdev emulation as well.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@xxxxxxxxxxx>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> <snip>
> > > >>>>>>>>
> > > >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> > > >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > > >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > > >>>>>>>>>>> @@ -36,6 +36,7 @@
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>  #include <drm/drm_client.h>
> > > >>>>>>>>>>>  #include <drm/drm_drv.h>
> > > >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> > > >>>>>>>>>>>  #include <drm/drmP.h>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>  #include "drm_crtc_internal.h"
> > > >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > > >>>>>>>>>>>  }
> > > >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> > > >>>>>>>>>>> +{
> > > >>>>>>>>>>> + drm_dev_put(data);
> > > >>>>>>>>>
> > > >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> > > >>>>>>>>
> > > >>>>>>>> This function is only used to cover the error path if probe fails before
> > > >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > > >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> > > >>>>>>>
> > > >>>>>>> I think I get a prize for being ignorant and blind :-/
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>> +}
> > > >>>>>>>>>>> +
> > > >>>>>>>>>>> +/**
> > > >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > > >>>>>>>>>>> + * @parent: Parent device object
> > > >>>>>>>>>>> + * @dev: DRM device
> > > >>>>>>>>>>> + * @driver: DRM driver
> > > >>>>>>>>>>> + *
> > > >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > > >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> > > >>>>>>>>>
> > > >>>>>>>>> I think a bit more clarity here would be good:
> > > >>>>>>>>>
> > > >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > > >>>>>>>>>
> > > >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > > >>>>>>>>>
> > > >>>>>>>>> I think a loud warning for these is in order:
> > > >>>>>>>>>
> > > >>>>>>>>> "WARNING:
> > > >>>>>>>>>
> > > >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> > > >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> > > >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> > > >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > > >>>>>>>>> refcount through drm_dev_put().
> > > >>>>>>>>>
> > > >>>>>>>>> "All other drm structures must still be explicitly released in the
> > > >>>>>>>>> &drm_driver.release callback."
> > > >>>>>>>>>
> > > >>>>>>>>> While thinking about this I just realized that with this design we have no
> > > >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > > >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > > >>>>>>>>> place to call it:
> > > >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > > >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > > >>>>>>>>>   down anything.
> > > >>>>>>>>> - drm_driver.release is way too late.
> > > >>>>>>>>>
> > > >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > > >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> > > >>>>>>>>> happens automatically and in the right order.
> > > >>>>>>>>>
> > > >>>>>>>>> So not sure what to do here really.
> > > >>>>>>>>
> > > >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> > > >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> > > >>>>>>>> helper instead?)
> > > >>>>>>>
> > > >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > > >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > > >>>>>>> and in the right order. So:
> > > >>>>>>>
> > > >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> > > >>>>>>>
> > > >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > > >>>>>>>
> > > >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > > >>>>>>>
> > > >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > > >>>>>>>   can call drm_dev_unplug() unconditionally).
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> Beautiful! I really like this, it's very flexible.
> > > >>>>>>
> > > >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> > > >>>>>> atomic helper...
> > > >>>>>
> > > >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> > > >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> > > >>>>> normal cleanup I think it'd be good to have it all together in one
> > > >>>>> place. And perhaps even a code example in the DOC: overview.
> > > >>>>>
> > > >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > > >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> > > >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > > >>>>>>> drm_mode_config_reset would be:
> > > >>>>>>>
> > > >>>>>>> {
> > > >>>>>>>       if (drm_dev_enter())
> > > >>>>>>>               return;
> > > >>>>>>>
> > > >>>>>>>       drm_atomic_helper_shutdown();
> > > >>>>>>>
> > > >>>>>>>       drm_dev_exit();
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> > > >>>>>> registered or not, it doesn't say anything about the state of the parent
> > > >>>>>> device.
> > > >>>>>>
> > > >>>>>> All we know is that the device is being unbound from the driver, we
> > > >>>>>> don't know if it's the device that's being removed or if it's the driver
> > > >>>>>> that's unregistered.
> > > >>>>>
> > > >>>>> You're right, both paths will have called drm_dev_unplug by then.
> > > >>>>> Silly me. I really liked my idea :-)
> > > >>>>>
> > > >>>>>> I have looked at the various call chains:
> > > >>>>>>
> > > >>>>>> driver_unregister ->
> > > >>>>>>     bus_remove_driver ->
> > > >>>>>>         driver_detach ->
> > > >>>>>>             device_release_driver_internal
> > > >>>>>>
> > > >>>>>> device_unregister ->
> > > >>>>>>     device_del ->
> > > >>>>>>         bus_remove_device ->
> > > >>>>>>             device_release_driver ->
> > > >>>>>>                 device_release_driver_internal
> > > >>>>>>
> > > >>>>>> sysfs: unbind_store ->
> > > >>>>>>     device_release_driver ->
> > > >>>>>>         device_release_driver_internal
> > > >>>>>>
> > > >>>>>> The only way I've found to differentiate between these in a cleanup
> > > >>>>>> action is that device_del() uses the bus notifier to signal
> > > >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > > >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> > > >>>>>
> > > >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> > > >>>>> way to do this, but best to prototype a patch with this, send it to
> > > >>>>> him and ask how to :-)
> > > >>>>>
> > > >>>>
> > > >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > > >>>> need to touch hw after DRM unregister.
> > > >>>>
> > > >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > > >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> > > >>>>>> world :-)
> > > >>>>>
> > > >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > > >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> > > >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> > > >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> > > >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> > > >>>>>
> > > >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> > > >>>>> apparently the cmdline tool you need, never tried it, I only switch
> > > >>>>> the kernel's console between fbcon and dummycon and back, not what
> > > >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> > > >>>>> And maybe you have a better idea how to deal with this all.
> > > >>>>>
> > > >>>>> Note also that there's been proposals floating around to only close an
> > > >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> > > >>>>> does), with that closing userspace would not necessarily lead to a
> > > >>>>> full cleanup.
> > > >>>>>
> > > >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > > >>>>> is if you have the display on, but no planes showing (i.e. all black).
> > > >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> > > >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> > > >>>>> even if fbcon/fbdev isn't even enabled.
> > > >>>>>
> > > >>>>
> > > >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > > >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > > >>>> emulation also releases everything.
> > > >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > > >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> > > >>>
> > > >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> > > >>>
> > > >>> Super short summary: We want to start using devm actions to clean up drm
> > > >>> drivers. Here's the problem:
> > > >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> > > >>>   up the hardware and shut it all down.
> > > >>
> > > >> Then do it on probe/disconnect.
> > > >>
> > > >>> - But if the device is unplugged already, that's probably not the best
> > > >>>   idea, and we only want to clean up the kernel's resources/allocations.
> > > >>
> > > >> Again, probe/disconnect will be called either way.
> > > >>
> > > >> But as you note, memory will NOT be freed by the devm stuff if you
> > > >> manually unbind a driver from a device.
> > > >>
> > > >> So don't touch hardware there, it's not going to work :)
> > > >>
> > > >>> What's the recommendation here? I see a few options:
> > > >>>
> > > >>> - Make sure everything can deal with this properly. Hotunplug can happen
> > > >>>   anytime, so there's a race no matter what.
> > > >>
> > > >> Yes.
> > > >>
> > > >>> - Check with the device model whether the struct device is disappearing or
> > > >>>   whether we're just dealing with a driver unbind (no idea how to do
> > > >>>   that), and act accordingly.
> > > >>
> > > >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> > > >> Assuming your hardware is still present :)
> > > >>
> > > >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> > > >>>   not, then the pretty nifty plan laid out in this thread wont work.
> > > >>
> > > >> Nope, that's not going to work, the device could either be long gone, or
> > > >> you will not be called due to unbind happening from userspace.
> > > >>
> > > >> But really, unbind from userspace is very very rare, it's a developer
> > > >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> > > >> :)
> > > >>
> > > >>> - Something completely different?
> > > >>
> > > >> Do it in disconnect :)
> > > >
> > > > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > > > should be the following sequence:
> > > >
> > > > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > > > make sure all the pending ioctls are done so that from now on
> > > > userspace sees lots of -EIO (in case it still has fd open, which is
> > > > going to be the normal for hotunplug.
> > > >
> > > > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > > > but needs to be able to cope with sudden hotunplug on top anyway).
> > > >
> > > > 3. Clean up the kernel mess and release everything.
> > > >
> > > > Probe is exactly the other way round, so would perfectly fit into the
> > > > devm onion cleanup. See in the commented earlier replies above how
> > > > that would match in details, but tldr; if we have to do 2. in
> > > > disconnect, then we also have to do 1. in disconnected, and only doing
> > > > 3. through devm is almost not worth the bother. But if we could do all
> > > > three through devm then simple drivers wouldn't even need any
> > > > disconnect/unbind callback at all. That's our motivation for trying to
> > > > come up with an answer that's not "do it in disconnect". "do it in
> > > > disconnect" is how we do it all today already.
> > > >
> > > > Yes we're trying to make tiny drivers even smaller, we have enough
> > > > nowadays that this stuff would be worth it :-)
> > > >
> > >
> > > I think a solution is to say that drivers that want to touch hw on
> > > disconnect needs to use device_driver->remove to do that.
> > >
> > > This is an example driver that doesn't need to touch hw because it's so
> > > simple that userspace has disabled the pipeline:
> > >
> > > static void drm_driver_release(struct drm_device *drm)
> > > {
> > >     drm_mode_config_cleanup(drm);
> > >     drm_dev_fini(drm);
> > >     kfree(drm);
> > > }
> > >
> > > static struct drm_driver drm_driver = {
> > >     .release = drm_driver_release,
> > >     /* ... */
> > > };
> > >
> > > static int driver_probe(struct device *dev)
> > > {
> > >     struct drm_device *drm;
> > >     int ret;
> > >
> > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > >     if (!drm)
> > >             return -ENOMEM;
> > >
> > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > >     if (ret) {
> > >             kfree(drm);
> > >             return ret;
> > >     }
> > >
> > >     drm_mode_config_init(drm);
> > >
> > >     /* Aquire various resources, all managed by devres */
> > >
> > >     drm_mode_config_reset(drm);
> > >
> > >     return devm_drm_dev_register(drm);
> > > }
> > >
> > > struct device_driver driver = {
> > >     .probe = driver_probe,
> > > };
> > >
> > >
> > > A driver that wants to touch hardware on disconnect, can look like this:
> > >
> > > static void drm_driver_release(struct drm_device *drm)
> > > {
> > >     drm_mode_config_cleanup(drm);
> > >     drm_dev_fini(drm);
> > >     kfree(drm);
> > > }
> > >
> > > static struct drm_driver drm_driver = {
> > >     .release = drm_driver_release,
> > >     /* ... */
> > > };
> > >
> > > static int driver_probe(struct device *dev)
> > > {
> > >     struct drm_device *drm;
> > >     int ret;
> > >
> > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > >     if (!drm)
> > >             return -ENOMEM;
> > >
> > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > >     if (ret) {
> > >             kfree(drm);
> > >             return ret;
> > >     }
> > >
> > >     drm_mode_config_init(drm);
> > >
> > >     /* Aquire various resources, all managed by devres */
> > >
> > >     drm_mode_config_reset(drm);
> > >
> > >     ret = drm_dev_register(drm);
> > >     if (ret)
> > >             return ret;
> > >
> > >     drm_dev_get(dev); /* If using drm_dev_unplug() */
> > >
> > >     dev_set_drvdata(dev, drm);
> > >
> > >     return 0;
> > > }
> > >
> > > /* This function is called before devres_release_all() */
> > > static int driver_remove(struct device *dev)
> > > {
> > >     struct drm_device *drm = dev_get_drvdata(dev);
> > >
> > >     drm_dev_unplug(drm); OR drm_dev_unregister(drm);
> > >     drm_atomic_helper_shutdown(drm)
> > >
> > >     return 0;
> > > }
> > >
> > > struct device_driver driver = {
> > >     .probe = driver_probe,
> > >     .remove = driver_remove,
> >
> > That's exactly the pattern I'm trying to avoid, because imo your tiny
> > driver _also_ should do this. Or we realize that all the current drivers
> > doing drm_atomic_helper_shutdown are misguided, but I'm not really
> > understanding why.
> >
> > Having a devm helper which cannot be used for some drivers due to
> > fundamental design issues is kinda not great, because it means everyone
> > will still use it, and shrug the bugs off as "not my problem". Which is
> > what's happening right now with all the devm_kzalloc we have in drm
> > drivers that all the get lifetim wrong. But because devm_ is convenient
> > everyone uses it, so the driver unload of most drivers is full of bugs.
>
> driver "unload" should not be full of bugs, how would it?  Anything
> created with devm_() will just be freed when the device really goes away
> in the system, it shouldn't call back into the driver.

The problem is when drivers use devm_ not to allocate hw resources and
related things, but structures for objects with other lifetimes. Like
open file descriptors shared with the world.

> And yes, devm_() is a crutch, one that I really don't like, but I
> understand the apeal.  And in 95% of the cases, it can work just fine as
> no one ever does unbind/bind from userspace, and if they did, they would
> never notice the memory issues.

Yeah, additionally people tend to not hotunplug gpus. So even there we
tend to get away with piles of bugs. And I'm kinda ok with piles of
bugs in driver unload code, as long as it's just driver unload code
and mostly gets the job done (i.e. doesn't blow up when a developer
first kills X and then unloads their driver).

But for adding new infrastructure in drm core/helpers that's a
different beast. I think we could pull something off, but not quite
there yet with the solution. Maybe we need something similar to devm,
but for drm uapi objects and stuff, adapted to our needs.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux