On Wed, Feb 06, 2019 at 05:46:51PM +0100, Noralf Trønnes wrote: > > > Den 06.02.2019 16.26, skrev Daniel Vetter: > > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote: > >> > >> > >> Den 05.02.2019 17.31, skrev Daniel Vetter: > >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote: > >>>> > >>>> > >>>> Den 05.02.2019 10.11, skrev Daniel Vetter: > >>>>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote: > >>>>>> > >>>>>> > >>>>>> Den 04.02.2019 16.41, skrev Daniel Vetter: > >>>>>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote: > >>>>>>>> The only thing now that makes drm_dev_unplug() special is that it sets > >>>>>>>> drm_device->unplugged. Move this code to drm_dev_unregister() so that we > >>>>>>>> can remove drm_dev_unplug(). > >>>>>>>> > >>>>>>>> Signed-off-by: Noralf Trønnes <noralf@xxxxxxxxxxx> > >>>>>>>> --- > >>>>>> > >>>>>> [...] > >>>>>> > >>>>>>>> drivers/gpu/drm/drm_drv.c | 27 +++++++++++++++------------ > >>>>>>>> include/drm/drm_drv.h | 10 ++++------ > >>>>>>>> 2 files changed, 19 insertions(+), 18 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c > >>>>>>>> index 05bbc2b622fc..e0941200edc6 100644 > >>>>>>>> --- a/drivers/gpu/drm/drm_drv.c > >>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c > >>>>>>>> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit); > >>>>>>>> */ > >>>>>>>> void drm_dev_unplug(struct drm_device *dev) > >>>>>>>> { > >>>>>>>> - /* > >>>>>>>> - * After synchronizing any critical read section is guaranteed to see > >>>>>>>> - * the new value of ->unplugged, and any critical section which might > >>>>>>>> - * still have seen the old value of ->unplugged is guaranteed to have > >>>>>>>> - * finished. > >>>>>>>> - */ > >>>>>>>> - dev->unplugged = true; > >>>>>>>> - synchronize_srcu(&drm_unplug_srcu); > >>>>>>>> - > >>>>>>>> drm_dev_unregister(dev); > >>>>>>>> drm_dev_put(dev); > >>>>>>>> } > >>>>>>>> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register); > >>>>>>>> * drm_dev_register() but does not deallocate the device. The caller must call > >>>>>>>> * drm_dev_put() to drop their final reference. > >>>>>>>> * > >>>>>>>> - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(), > >>>>>>>> - * which can be called while there are still open users of @dev. > >>>>>>>> + * This function can be called while there are still open users of @dev as long > >>>>>>>> + * as the driver protects its device resources using drm_dev_enter() and > >>>>>>>> + * drm_dev_exit(). > >>>>>>>> * > >>>>>>>> * This should be called first in the device teardown code to make sure > >>>>>>>> - * userspace can't access the device instance any more. > >>>>>>>> + * userspace can't access the device instance any more. Drivers that support > >>>>>>>> + * device unplug will probably want to call drm_atomic_helper_shutdown() first > >>>>>>> > >>>>>>> Read once more with a bit more coffee, spotted this: > >>>>>>> > >>>>>>> s/first/afterwards/ - shutting down the hw before we've taken it away from > >>>>>>> userspace is kinda the wrong way round. It should be the inverse of driver > >>>>>>> load, which is 1) allocate structures 2) prep hw 3) register driver with > >>>>>>> the world (simplified ofc). > >>>>>>> > >>>>>> > >>>>>> The problem is that drm_dev_unregister() sets the device as unplugged > >>>>>> and if drm_atomic_helper_shutdown() is called afterwards it's not > >>>>>> allowed to touch hardware. > >>>>>> > >>>>>> I know it's the wrong order, but the only way to do it in the right > >>>>>> order is to have a separate function that sets unplugged: > >>>>>> > >>>>>> drm_dev_unregister(); > >>>>>> drm_atomic_helper_shutdown(); > >>>>>> drm_dev_set_unplugged(); > >>>>> > >>>>> Annoying ... but yeah calling _shutdown() before we stopped userspace is > >>>>> also not going to work. Because userspace could quickly re-enable > >>>>> something, and then the refcounts would be all wrong again and leaking > >>>>> objects. > >>>>> > >>>> > >>>> What happens with a USB device that is unplugged with open userspace, > >>>> will that leak objects? > >>> > >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run > >>> as normal, the only thing that should be skipped is actually touching the > >>> hw (as long as the driver doesn't protect too much with > >>> drm_dev_enter/exit). So all the software updates (including refcounting > >>> updates) will still be done. Ofc current udl is not yet atomic, so in > >>> reality something else happens. > >>> > >>> And we ofc still have the same issue that if you just unload the driver, > >>> then the hw will stay on (which might really confuse the driver on next > >>> load, when it assumes that it only gets loaded from cold boot where > >>> everything is off - which usually is the case on an arm soc at least). > >>> > >>>>> I get a bit the feeling we're over-optimizing here with trying to devm-ize > >>>>> drm_dev_register. Just getting drm_device correctly devm-ized is a big > >>>>> step forward already, and will open up a lot of TODO items across a lot of > >>>>> drivers. E.g. we could add a drm_dev_kzalloc, for allocating all the drm_* > >>>>> structs, which gets released together with drm_device. I think that's a > >>>>> much clearer path forward, I think we all agree that getting the kfree out > >>>>> of the driver codes is a good thing, and it would allow us to do this > >>>>> correctly. > >>>>> > >>>>> Then once we have that and rolled out to a few drivers we can reconsider > >>>>> the entire unregister/shutdown gordian knot here. Atm I just have no idea > >>>>> how to do this properly :-/ > >>>>> > >>>>> Thoughts, other ideas? > >>>>> > >>>> > >>>> Yeah, I've come to the conclusion that devm_drm_dev_register() doesn't > >>>> make much sense if we need a driver remove callback anyways. > >>> > >>> Yup, that's what I meant with the above: > >>> - merge devm_drm_dev_register, use it a lot. This is definitely a good > >>> idea. > >>> - create a drm_dev_kzalloc, which automatically releases memory on the > >>> final drm_dev_put. Use it every in drivers for drm structures, > >>> especially in those drivers that currently use devm (which releases > >>> those allocations potentialy too early on unplug). > >>> - figure out the next steps after that > >>> > >>>> I think devm_drm_dev_init() makes sense because it yields a cleaner > >>>> probe() function. An additional benefit is that it requires a > >>>> drm_driver->release function which is a step in the right direction to > >>>> get the drm_device lifetime right. > >>>> > >>>> Do we agree that a drm_dev_set_unplugged() function is necessary to get > >>>> the remove/disconnect order right? > >>>> > >>>> What about drm_dev_unplug() maybe I should just leave it be? > >>>> > >>>> - amd uses drm_driver->unload, so that one takes some work to get right > >>>> to support unplug. It doesn't check the unplugged state, so really > >>>> doesn't need drm_dev_unplug() I guess. Do they have cards that can be > >>>> hotplugged? > >>> > >>> Yeah amd still uses ->load and ->unload, which is not great unfortunately. > >>> I just stumbled over that when I tried to make a series to disable the > >>> global drm_global_mutex for most drivers ... > >>> > >>>> - udl uses drm_driver->unload, doesn't use drm_atomic_helper_shutdown(). > >>>> It has only one drm_dev_is_unplugged() check and that is in its > >>>> fbdev->open callback. > >>> > >>> udl isn't atomic, so can't use the atomic helpers. pre-atomic doesn't have > >>> refcounting issues when something is left on iirc. I think udl is written > >>> under the assumption that ->unload is always called for an unplug, never > >>> for an actual unload. > >>> > >>>> - xen calls drm_atomic_helper_shutdown() in its drm_driver->release > >>>> callback which I guess is not correct. > >>> > >>> Yeah this smells fishy. ->release is supposed to be for cleaning up kernel > >>> structures, not for cleaning up the hw. So maybe drm_mode_config_cleanup > >>> could be put there, not sure honestly. But definitely not _shutdown(). > >>> > >>>> This is what it will look like with a set unplugged function: > >>>> > >>>> void drm_dev_unplug(struct drm_device *dev) > >>>> { > >>>> drm_dev_set_unplugged(dev); > >>>> drm_dev_unregister(dev); > >>>> drm_dev_put(dev); > >>>> } > >>>> > >>>> Hm, I should probably remove it to avoid further use of it since it's > >>>> not correct for a modern driver. > >>> > >>> I think something flew over my head ... what's the drm_dev_set_unplugged > >>> for? I think I'm not following you ... > >>> > >> > >> Taking it a few steps back: > >> > >> This patch proposes to move 'dev->unplugged = true' from > >> drm_dev_unplug() to drm_dev_unregister(). > >> > >> Additionally I proposed this change to the drm_dev_unregister() docs: > >> > >> * This should be called first in the device teardown code to make sure > >> - * userspace can't access the device instance any more. > >> + * userspace can't access the device instance any more. Drivers that > >> support > >> + * device unplug will probably want to call > >> drm_atomic_helper_shutdown() first > >> + * in order to disable the hardware on regular driver module unload. > >> > >> Which would give a driver remove callback like this: > >> > >> static int driver_remove() > >> { > >> drm_atomic_helper_shutdown(); > >> drm_dev_unregister(); > >> } > >> > >> Your reaction was that drm_atomic_helper_shutdown() needs to be called > >> after drm_dev_unregister() to avoid a race resulting in leaked objects. > >> However if we call it afterwards, ->unplugged will be true and the > >> driver can't touch hardware. > >> > >> Then I proposed moving the unplugged state setting to: > >> > >> void drm_dev_set_unplugged(struct drm_device *dev) > >> { > >> dev->unplugged = true; > >> synchronize_srcu(&drm_unplug_srcu); > >> } > >> > >> Now it is possible to avoid the race and still touch hardware: > >> > >> static int driver_remove() > >> { > >> drm_dev_unregister(); > >> drm_atomic_helper_shutdown(); > >> drm_dev_set_unplugged(); > >> } > >> > >> But now I'm back to the question: Is it the driver or the device that is > >> going away? > >> > >> If it's the driver we are fine touching hardware, if it's the device it > >> depends on how we access the device resource and whether the subsystem > >> has protection in place to handle access after the device is gone. I > >> think USB can handle and block device access up until the disconnect > >> callback has finished (no point in doing so though, since the normal > >> operation is that the device is gone, not the driver unloading). > >> > >> Is there a way to determine who's going away without changes to the > >> device core? > >> > >> Maybe. The driver can only be unregistered if there are no open file > >> handles because a ref is taken on the driver module. > > > > This isn't true. You can (not many people do, but it's possible) to unbind > > through sysfs. The module reference only keeps the cpu instructions valid, > > nothing else. > > > >> So maybe something along these lines: > >> > >> int drm_dev_open_count(struct drm_device *dev) > >> { > >> int open_count; > >> > >> mutex_lock(&drm_global_mutex); > >> open_count = dev->open_count; > >> mutex_unlock(&drm_global_mutex); > > > > Random style nit: READ_ONCE, no locking needed (the locks don't change > > anything, except if you have really strange locking rules). Serves > > double-duty as a huge warning sign that something tricky is happening. > > > >> return open_count; > >> } > >> > >> static int driver_remove() > >> { > >> drm_dev_unregister(); > >> > >> open_count = drm_dev_open_count(); > >> > >> /* Open fd's, device is going away, block access */ > >> if (open_count) > >> drm_dev_set_unplugged(); > >> > >> drm_atomic_helper_shutdown(); > >> > >> /* No open fd's, driver is going away */ > >> if (!open_count) > >> drm_dev_set_unplugged(); > >> } > >> > >> > >> Personally I have 2 use cases: > >> - tinydrm SPI drivers > >> The only hotpluggable SPI controllers I know of are USB which should > >> handle device access while unregistering. > >> SPI devices can be removed, but the controller driver is still in > >> place so it's safe. > >> - A future USB driver (that I hope to work on when all this core stuff > >> is in place). > >> There's no point in touching the hw, so DRM can be set uplugged right > >> away in the disconnect() callback. > >> > >> This means that I don't need to know who's going away for my use cases. > >> > >> This turned out to be quite long winding, but the bottom line is that > >> having a separate function to set the unplugged state seems to give a > >> lot of flexibility for various use cases. > >> > >> I hope I didn't muddy the waters even more :-) > > > > Hm, I think I get your idea. Since I'm still not sure what the best option > > is I'm leaning towards just leaving stuff as-is. I.e. drivers which want > > to shut down hw can do the 1. drm_dev_unregister() 2. > > drm_atomic_helper_shutdown() sequence. Drivers which care about hotunplug > > more can do just the drm_dev_unplug(). > > > > Yes it's messy and unsatisfactory, but in case of serious doubt I like to > > wait until maybe in the future we have a good idea. Meanwhile leaving a > > bit of a mess around is better than charging ahead in a possibly totally > > wrong direction. > > > > There's cases where clue still hasn't hit me, even years later (or anyone > > else). That's just how it is sometimes. > > > > Ok, I'll drop this. > > This means that I'll drop devm_drm_dev_init() as well since it doesn't > play well with drm_dev_unplug() since both will do drm_dev_put(). Not a > big deal really, it just means that I need to add one error path in the > probe function so I can drm_dev_put() on error. Hm, I think we could just move the drm_dev_put out from drm_dev_unplug. And then encourage everyone to use devm_drm_dev_init() everywhere. I do like to get started with auto-cleaned up in drm somehow, and devm_drm_dev_init doing the drm_dev_put() looks like a really good idea. > Are you still ok with the first bug fix patch in this series? Yeah that one still looks good. Cheers, Daniel > > > Zooming out more looking at the big picture I'd say all your work in the > > past few years has enormously simplified drm for simple drivers already. > > If we can't resolve this one here right now that just means you "only" > > made drm 98% simpler instead of maybe 99%. It's still an epic win :-) > > > > Thanks, your mentoring and reviews helped turn my rough ideas into > something useful :-) > > Noralf. > > > Cheers, Daniel > > > > > >> Noralf. > >> > >>> Thanks, Daniel > >>> > >>>> > >>>> Noralf. > >>>> > >>>>> Cheers, Daniel > >>>>> > >>>>>> Noralf. > >>>>>> > >>>>>>>> + * in order to disable the hardware on regular driver module unload. > >>>>>>>> */ > >>>>>>>> void drm_dev_unregister(struct drm_device *dev) > >>>>>>>> { > >>>>>>>> @@ -845,6 +839,15 @@ void drm_dev_unregister(struct drm_device *dev) > >>>>>>>> if (drm_core_check_feature(dev, DRIVER_LEGACY)) > >>>>>>>> drm_lastclose(dev); > >>>>>>>> > >>>>>>>> + /* > >>>>>>>> + * After synchronizing any critical read section is guaranteed to see > >>>>>>>> + * the new value of ->unplugged, and any critical section which might > >>>>>>>> + * still have seen the old value of ->unplugged is guaranteed to have > >>>>>>>> + * finished. > >>>>>>>> + */ > >>>>>>>> + dev->unplugged = true; > >>>>>>>> + synchronize_srcu(&drm_unplug_srcu); > >>>>>>>> + > >>>>>>>> dev->registered = false; > >>>>>>>> > >>>>>>>> drm_client_dev_unregister(dev); > >>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h > >>>>>>>> index ca46a45a9cce..c50696c82a42 100644 > >>>>>>>> --- a/include/drm/drm_drv.h > >>>>>>>> +++ b/include/drm/drm_drv.h > >>>>>>>> @@ -736,13 +736,11 @@ void drm_dev_unplug(struct drm_device *dev); > >>>>>>>> * drm_dev_is_unplugged - is a DRM device unplugged > >>>>>>>> * @dev: DRM device > >>>>>>>> * > >>>>>>>> - * This function can be called to check whether a hotpluggable is unplugged. > >>>>>>>> - * Unplugging itself is singalled through drm_dev_unplug(). If a device is > >>>>>>>> - * unplugged, these two functions guarantee that any store before calling > >>>>>>>> - * drm_dev_unplug() is visible to callers of this function after it completes > >>>>>>>> + * This function can be called to check whether @dev is unregistered. This can > >>>>>>>> + * be used to detect that the underlying parent device is gone. > >>>>>>> > >>>>>>> I think it'd be good to keep the first part, and just update the reference > >>>>>>> to drm_dev_unregister. So: > >>>>>>> > >>>>>>> * This function can be called to check whether a hotpluggable is unplugged. > >>>>>>> * Unplugging itself is singalled through drm_dev_unregister(). If a device is > >>>>>>> * unplugged, these two functions guarantee that any store before calling > >>>>>>> * drm_dev_unregister() is visible to callers of this function after it > >>>>>>> * completes. > >>>>>>> > >>>>>>> I think your version shrugs a few important details under the rug. With > >>>>>>> those nits addressed: > >>>>>>> > >>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > >>>>>>> > >>>>>>> Cheers, Daniel > >>>>>>> > >>>>>>>> * > >>>>>>>> - * WARNING: This function fundamentally races against drm_dev_unplug(). It is > >>>>>>>> - * recommended that drivers instead use the underlying drm_dev_enter() and > >>>>>>>> + * WARNING: This function fundamentally races against drm_dev_unregister(). It > >>>>>>>> + * is recommended that drivers instead use the underlying drm_dev_enter() and > >>>>>>>> * drm_dev_exit() function pairs. > >>>>>>>> */ > >>>>>>>> static inline bool drm_dev_is_unplugged(struct drm_device *dev) > >>>>>>>> -- > >>>>>>>> 2.20.1 > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Intel-gfx mailing list > >>>>>>>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx > >>>>>>> > >>>>> > >>> > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx