On Tue, Jan 29, 2019 at 07:10:55PM +0100, Daniel Vetter wrote: > On Tue, Jan 29, 2019 at 6:36 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > On Tue, Jan 29, 2019 at 05:50:05PM +0100, Daniel Vetter wrote: > > > On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote: > > > > > > > > > > > > Den 24.01.2019 18.57, skrev Daniel Vetter: > > > > > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > >> > > > > >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote: > > > > >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote: > > > > >>>> > > > > >>>> > > > > >>>> Den 22.01.2019 20.30, skrev Daniel Vetter: > > > > >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@xxxxxxxxxxx> wrote: > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter: > > > > >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote: > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter: > > > > >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote: > > > > >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote: > > > > >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and > > > > >>>>>>>>>>> drm_dev_register(). > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic > > > > >>>>>>>>>>> fbdev emulation as well. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@xxxxxxxxxxx> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> <snip> > > > > >>>>>>>> > > > > >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > > > > >>>>>>>>>>> index 381581b01d48..12129772be45 100644 > > > > >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c > > > > >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c > > > > >>>>>>>>>>> @@ -36,6 +36,7 @@ > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> #include <drm/drm_client.h> > > > > >>>>>>>>>>> #include <drm/drm_drv.h> > > > > >>>>>>>>>>> +#include <drm/drm_fb_helper.h> > > > > >>>>>>>>>>> #include <drm/drmP.h> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> #include "drm_crtc_internal.h" > > > > >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev) > > > > >>>>>>>>>>> } > > > > >>>>>>>>>>> EXPORT_SYMBOL(drm_dev_unregister); > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data) > > > > >>>>>>>>>>> +{ > > > > >>>>>>>>>>> + drm_dev_put(data); > > > > >>>>>>>>> > > > > >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe. > > > > >>>>>>>> > > > > >>>>>>>> This function is only used to cover the error path if probe fails before > > > > >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is > > > > >>>>>>>> the one that calls unplug. There are comments about this in the functions. > > > > >>>>>>> > > > > >>>>>>> I think I get a prize for being ignorant and blind :-/ > > > > >>>>>>> > > > > >>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>>>> +} > > > > >>>>>>>>>>> + > > > > >>>>>>>>>>> +/** > > > > >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init() > > > > >>>>>>>>>>> + * @parent: Parent device object > > > > >>>>>>>>>>> + * @dev: DRM device > > > > >>>>>>>>>>> + * @driver: DRM driver > > > > >>>>>>>>>>> + * > > > > >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is > > > > >>>>>>>>>>> + * automatically released on driver detach. You must supply a > > > > >>>>>>>>> > > > > >>>>>>>>> I think a bit more clarity here would be good: > > > > >>>>>>>>> > > > > >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()." > > > > >>>>>>>>> > > > > >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly. > > > > >>>>>>>>> > > > > >>>>>>>>> I think a loud warning for these is in order: > > > > >>>>>>>>> > > > > >>>>>>>>> "WARNING: > > > > >>>>>>>>> > > > > >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures > > > > >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not > > > > >>>>>>>>> match. This here works because it doesn't immediately free anything, but > > > > >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device > > > > >>>>>>>>> refcount through drm_dev_put(). > > > > >>>>>>>>> > > > > >>>>>>>>> "All other drm structures must still be explicitly released in the > > > > >>>>>>>>> &drm_driver.release callback." > > > > >>>>>>>>> > > > > >>>>>>>>> While thinking about this I just realized that with this design we have no > > > > >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all > > > > >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no > > > > >>>>>>>>> place to call it: > > > > >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the > > > > >>>>>>>>> drm_dev_unregister in there must be called _before_ we start to shut > > > > >>>>>>>>> down anything. > > > > >>>>>>>>> - drm_driver.release is way too late. > > > > >>>>>>>>> > > > > >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's > > > > >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this > > > > >>>>>>>>> happens automatically and in the right order. > > > > >>>>>>>>> > > > > >>>>>>>>> So not sure what to do here really. > > > > >>>>>>>> > > > > >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the > > > > >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS > > > > >>>>>>>> helper instead?) > > > > >>>>>>> > > > > >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile > > > > >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly, > > > > >>>>>>> and in the right order. So: > > > > >>>>>>> > > > > >>>>>>> - devm_drm_dev_init (always does a drm_dev_put()) > > > > >>>>>>> > > > > >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action) > > > > >>>>>>> > > > > >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action) > > > > >>>>>>> > > > > >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it > > > > >>>>>>> can call drm_dev_unplug() unconditionally). > > > > >>>>>>> > > > > >>>>>> > > > > >>>>>> Beautiful! I really like this, it's very flexible. > > > > >>>>>> > > > > >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the > > > > >>>>>> atomic helper... > > > > >>>>> > > > > >>>>> I think a new drm_devm.c helper would be nice for all this stuff. > > > > >>>>> Especially since you can't freely mix devm-based setup/cleanup with > > > > >>>>> normal cleanup I think it'd be good to have it all together in one > > > > >>>>> place. And perhaps even a code example in the DOC: overview. > > > > >>>>> > > > > >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is > > > > >>>>>>> gone, but I think we can achieve that by liberally sprinkling > > > > >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for > > > > >>>>>>> drm_mode_config_reset would be: > > > > >>>>>>> > > > > >>>>>>> { > > > > >>>>>>> if (drm_dev_enter()) > > > > >>>>>>> return; > > > > >>>>>>> > > > > >>>>>>> drm_atomic_helper_shutdown(); > > > > >>>>>>> > > > > >>>>>>> drm_dev_exit(); > > > > >>>>>>> } > > > > >>>>>>> > > > > >>>>>> > > > > >>>>>> drm_dev_enter() can only be used to check whether the drm_device is > > > > >>>>>> registered or not, it doesn't say anything about the state of the parent > > > > >>>>>> device. > > > > >>>>>> > > > > >>>>>> All we know is that the device is being unbound from the driver, we > > > > >>>>>> don't know if it's the device that's being removed or if it's the driver > > > > >>>>>> that's unregistered. > > > > >>>>> > > > > >>>>> You're right, both paths will have called drm_dev_unplug by then. > > > > >>>>> Silly me. I really liked my idea :-) > > > > >>>>> > > > > >>>>>> I have looked at the various call chains: > > > > >>>>>> > > > > >>>>>> driver_unregister -> > > > > >>>>>> bus_remove_driver -> > > > > >>>>>> driver_detach -> > > > > >>>>>> device_release_driver_internal > > > > >>>>>> > > > > >>>>>> device_unregister -> > > > > >>>>>> device_del -> > > > > >>>>>> bus_remove_device -> > > > > >>>>>> device_release_driver -> > > > > >>>>>> device_release_driver_internal > > > > >>>>>> > > > > >>>>>> sysfs: unbind_store -> > > > > >>>>>> device_release_driver -> > > > > >>>>>> device_release_driver_internal > > > > >>>>>> > > > > >>>>>> The only way I've found to differentiate between these in a cleanup > > > > >>>>>> action is that device_del() uses the bus notifier to signal > > > > >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a > > > > >>>>>> notifier could be used to set a drm_device->parent_removed flag. > > > > >>>>> > > > > >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better > > > > >>>>> way to do this, but best to prototype a patch with this, send it to > > > > >>>>> him and ask how to :-) > > > > >>>>> > > > > >>>> > > > > >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't > > > > >>>> need to touch hw after DRM unregister. > > > > >>>> > > > > >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't > > > > >>>>>> everything get disabled when userspace closes? It does in my tinydrm > > > > >>>>>> world :-) > > > > >>>>> > > > > >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches > > > > >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and > > > > >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the > > > > >>>>> unload path. Maybe this is cargo-culting, but it goes way back to > > > > >>>>> pre-atomic, where drivers called drm_helper_force_disable_all(). > > > > >>>>> > > > > >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is > > > > >>>>> apparently the cmdline tool you need, never tried it, I only switch > > > > >>>>> the kernel's console between fbcon and dummycon and back, not what > > > > >>>>> fbcon drivers itself), then I think you should be able to reproduce. > > > > >>>>> And maybe you have a better idea how to deal with this all. > > > > >>>>> > > > > >>>>> Note also that there's been proposals floating around to only close an > > > > >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl > > > > >>>>> does), with that closing userspace would not necessarily lead to a > > > > >>>>> full cleanup. > > > > >>>>> > > > > >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers) > > > > >>>>> is if you have the display on, but no planes showing (i.e. all black). > > > > >>>>> Then all the fbs will be cleaned up, but drm_connector will be > > > > >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown() > > > > >>>>> even if fbcon/fbdev isn't even enabled. > > > > >>>>> > > > > >>>> > > > > >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in > > > > >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev > > > > >>>> emulation also releases everything. > > > > >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to > > > > >>>> keep drivers uniform, to avoid confusion: why doesn't he use it? > > > > >>> > > > > >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael. > > > > >>> > > > > >>> Super short summary: We want to start using devm actions to clean up drm > > > > >>> drivers. Here's the problem: > > > > >>> - For a driver unload/unbind without hotunplug, we want to properly clean > > > > >>> up the hardware and shut it all down. > > > > >> > > > > >> Then do it on probe/disconnect. > > > > >> > > > > >>> - But if the device is unplugged already, that's probably not the best > > > > >>> idea, and we only want to clean up the kernel's resources/allocations. > > > > >> > > > > >> Again, probe/disconnect will be called either way. > > > > >> > > > > >> But as you note, memory will NOT be freed by the devm stuff if you > > > > >> manually unbind a driver from a device. > > > > >> > > > > >> So don't touch hardware there, it's not going to work :) > > > > >> > > > > >>> What's the recommendation here? I see a few options: > > > > >>> > > > > >>> - Make sure everything can deal with this properly. Hotunplug can happen > > > > >>> anytime, so there's a race no matter what. > > > > >> > > > > >> Yes. > > > > >> > > > > >>> - Check with the device model whether the struct device is disappearing or > > > > >>> whether we're just dealing with a driver unbind (no idea how to do > > > > >>> that), and act accordingly. > > > > >> > > > > >> You don't know that, sorry. Just do any hardware stuff on disconnect. > > > > >> Assuming your hardware is still present :) > > > > >> > > > > >>> - Fundamental question: Touching the hw from devm actions, is that ok? If > > > > >>> not, then the pretty nifty plan laid out in this thread wont work. > > > > >> > > > > >> Nope, that's not going to work, the device could either be long gone, or > > > > >> you will not be called due to unbind happening from userspace. > > > > >> > > > > >> But really, unbind from userspace is very very rare, it's a developer > > > > >> thing mostly. Oh and a virtual driver thing, but those people are crazy > > > > >> :) > > > > >> > > > > >>> - Something completely different? > > > > >> > > > > >> Do it in disconnect :) > > > > > > > > > > Ah, I forgot to mention the important constraint :-) disconnect/unbind > > > > > should be the following sequence: > > > > > > > > > > 1. Unregister all the userspace interfaces (there's a lot of them) and > > > > > make sure all the pending ioctls are done so that from now on > > > > > userspace sees lots of -EIO (in case it still has fd open, which is > > > > > going to be the normal for hotunplug. > > > > > > > > > > 2. Shut down hw and all ongoing operations (only relevant for unbind, > > > > > but needs to be able to cope with sudden hotunplug on top anyway). > > > > > > > > > > 3. Clean up the kernel mess and release everything. > > > > > > > > > > Probe is exactly the other way round, so would perfectly fit into the > > > > > devm onion cleanup. See in the commented earlier replies above how > > > > > that would match in details, but tldr; if we have to do 2. in > > > > > disconnect, then we also have to do 1. in disconnected, and only doing > > > > > 3. through devm is almost not worth the bother. But if we could do all > > > > > three through devm then simple drivers wouldn't even need any > > > > > disconnect/unbind callback at all. That's our motivation for trying to > > > > > come up with an answer that's not "do it in disconnect". "do it in > > > > > disconnect" is how we do it all today already. > > > > > > > > > > Yes we're trying to make tiny drivers even smaller, we have enough > > > > > nowadays that this stuff would be worth it :-) > > > > > > > > > > > > > I think a solution is to say that drivers that want to touch hw on > > > > disconnect needs to use device_driver->remove to do that. > > > > > > > > This is an example driver that doesn't need to touch hw because it's so > > > > simple that userspace has disabled the pipeline: > > > > > > > > static void drm_driver_release(struct drm_device *drm) > > > > { > > > > drm_mode_config_cleanup(drm); > > > > drm_dev_fini(drm); > > > > kfree(drm); > > > > } > > > > > > > > static struct drm_driver drm_driver = { > > > > .release = drm_driver_release, > > > > /* ... */ > > > > }; > > > > > > > > static int driver_probe(struct device *dev) > > > > { > > > > struct drm_device *drm; > > > > int ret; > > > > > > > > drm = kzalloc(sizeof(*drm), GFP_KERNEL); > > > > if (!drm) > > > > return -ENOMEM; > > > > > > > > ret = devm_drm_dev_init(dev, drm, &drm_driver); > > > > if (ret) { > > > > kfree(drm); > > > > return ret; > > > > } > > > > > > > > drm_mode_config_init(drm); > > > > > > > > /* Aquire various resources, all managed by devres */ > > > > > > > > drm_mode_config_reset(drm); > > > > > > > > return devm_drm_dev_register(drm); > > > > } > > > > > > > > struct device_driver driver = { > > > > .probe = driver_probe, > > > > }; > > > > > > > > > > > > A driver that wants to touch hardware on disconnect, can look like this: > > > > > > > > static void drm_driver_release(struct drm_device *drm) > > > > { > > > > drm_mode_config_cleanup(drm); > > > > drm_dev_fini(drm); > > > > kfree(drm); > > > > } > > > > > > > > static struct drm_driver drm_driver = { > > > > .release = drm_driver_release, > > > > /* ... */ > > > > }; > > > > > > > > static int driver_probe(struct device *dev) > > > > { > > > > struct drm_device *drm; > > > > int ret; > > > > > > > > drm = kzalloc(sizeof(*drm), GFP_KERNEL); > > > > if (!drm) > > > > return -ENOMEM; > > > > > > > > ret = devm_drm_dev_init(dev, drm, &drm_driver); > > > > if (ret) { > > > > kfree(drm); > > > > return ret; > > > > } > > > > > > > > drm_mode_config_init(drm); > > > > > > > > /* Aquire various resources, all managed by devres */ > > > > > > > > drm_mode_config_reset(drm); > > > > > > > > ret = drm_dev_register(drm); > > > > if (ret) > > > > return ret; > > > > > > > > drm_dev_get(dev); /* If using drm_dev_unplug() */ > > > > > > > > dev_set_drvdata(dev, drm); > > > > > > > > return 0; > > > > } > > > > > > > > /* This function is called before devres_release_all() */ > > > > static int driver_remove(struct device *dev) > > > > { > > > > struct drm_device *drm = dev_get_drvdata(dev); > > > > > > > > drm_dev_unplug(drm); OR drm_dev_unregister(drm); > > > > drm_atomic_helper_shutdown(drm) > > > > > > > > return 0; > > > > } > > > > > > > > struct device_driver driver = { > > > > .probe = driver_probe, > > > > .remove = driver_remove, > > > > > > That's exactly the pattern I'm trying to avoid, because imo your tiny > > > driver _also_ should do this. Or we realize that all the current drivers > > > doing drm_atomic_helper_shutdown are misguided, but I'm not really > > > understanding why. > > > > > > Having a devm helper which cannot be used for some drivers due to > > > fundamental design issues is kinda not great, because it means everyone > > > will still use it, and shrug the bugs off as "not my problem". Which is > > > what's happening right now with all the devm_kzalloc we have in drm > > > drivers that all the get lifetim wrong. But because devm_ is convenient > > > everyone uses it, so the driver unload of most drivers is full of bugs. > > > > driver "unload" should not be full of bugs, how would it? Anything > > created with devm_() will just be freed when the device really goes away > > in the system, it shouldn't call back into the driver. > > The problem is when drivers use devm_ not to allocate hw resources and > related things, but structures for objects with other lifetimes. Like > open file descriptors shared with the world. And irqs, which bites everyone in the end. You have to be careful here, never tie a devm allocation to an object with another reference count, that's just a bug. greg k-h _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel