Den 06.02.2019 16.26, skrev Daniel Vetter: > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote: >> >> >> Den 05.02.2019 17.31, skrev Daniel Vetter: >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote: >>>> >>>> >>>> Den 05.02.2019 10.11, skrev Daniel Vetter: >>>>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote: >>>>>> >>>>>> >>>>>> Den 04.02.2019 16.41, skrev Daniel Vetter: >>>>>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote: >>>>>>>> The only thing now that makes drm_dev_unplug() special is that it sets >>>>>>>> drm_device->unplugged. Move this code to drm_dev_unregister() so that we >>>>>>>> can remove drm_dev_unplug(). >>>>>>>> >>>>>>>> Signed-off-by: Noralf Trønnes <noralf@xxxxxxxxxxx> >>>>>>>> --- >>>>>> >>>>>> [...] >>>>>> >>>>>>>> drivers/gpu/drm/drm_drv.c | 27 +++++++++++++++------------ >>>>>>>> include/drm/drm_drv.h | 10 ++++------ >>>>>>>> 2 files changed, 19 insertions(+), 18 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c >>>>>>>> index 05bbc2b622fc..e0941200edc6 100644 >>>>>>>> --- a/drivers/gpu/drm/drm_drv.c >>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c >>>>>>>> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit); >>>>>>>> */ >>>>>>>> void drm_dev_unplug(struct drm_device *dev) >>>>>>>> { >>>>>>>> - /* >>>>>>>> - * After synchronizing any critical read section is guaranteed to see >>>>>>>> - * the new value of ->unplugged, and any critical section which might >>>>>>>> - * still have seen the old value of ->unplugged is guaranteed to have >>>>>>>> - * finished. >>>>>>>> - */ >>>>>>>> - dev->unplugged = true; >>>>>>>> - synchronize_srcu(&drm_unplug_srcu); >>>>>>>> - >>>>>>>> drm_dev_unregister(dev); >>>>>>>> drm_dev_put(dev); >>>>>>>> } >>>>>>>> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register); >>>>>>>> * drm_dev_register() but does not deallocate the device. The caller must call >>>>>>>> * drm_dev_put() to drop their final reference. >>>>>>>> * >>>>>>>> - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(), >>>>>>>> - * which can be called while there are still open users of @dev. >>>>>>>> + * This function can be called while there are still open users of @dev as long >>>>>>>> + * as the driver protects its device resources using drm_dev_enter() and >>>>>>>> + * drm_dev_exit(). >>>>>>>> * >>>>>>>> * This should be called first in the device teardown code to make sure >>>>>>>> - * userspace can't access the device instance any more. >>>>>>>> + * userspace can't access the device instance any more. Drivers that support >>>>>>>> + * device unplug will probably want to call drm_atomic_helper_shutdown() first >>>>>>> >>>>>>> Read once more with a bit more coffee, spotted this: >>>>>>> >>>>>>> s/first/afterwards/ - shutting down the hw before we've taken it away from >>>>>>> userspace is kinda the wrong way round. It should be the inverse of driver >>>>>>> load, which is 1) allocate structures 2) prep hw 3) register driver with >>>>>>> the world (simplified ofc). >>>>>>> >>>>>> >>>>>> The problem is that drm_dev_unregister() sets the device as unplugged >>>>>> and if drm_atomic_helper_shutdown() is called afterwards it's not >>>>>> allowed to touch hardware. >>>>>> >>>>>> I know it's the wrong order, but the only way to do it in the right >>>>>> order is to have a separate function that sets unplugged: >>>>>> >>>>>> drm_dev_unregister(); >>>>>> drm_atomic_helper_shutdown(); >>>>>> drm_dev_set_unplugged(); >>>>> >>>>> Annoying ... but yeah calling _shutdown() before we stopped userspace is >>>>> also not going to work. Because userspace could quickly re-enable >>>>> something, and then the refcounts would be all wrong again and leaking >>>>> objects. >>>>> >>>> >>>> What happens with a USB device that is unplugged with open userspace, >>>> will that leak objects? >>> >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run >>> as normal, the only thing that should be skipped is actually touching the >>> hw (as long as the driver doesn't protect too much with >>> drm_dev_enter/exit). So all the software updates (including refcounting >>> updates) will still be done. Ofc current udl is not yet atomic, so in >>> reality something else happens. >>> >>> And we ofc still have the same issue that if you just unload the driver, >>> then the hw will stay on (which might really confuse the driver on next >>> load, when it assumes that it only gets loaded from cold boot where >>> everything is off - which usually is the case on an arm soc at least). >>> >>>>> I get a bit the feeling we're over-optimizing here with trying to devm-ize >>>>> drm_dev_register. Just getting drm_device correctly devm-ized is a big >>>>> step forward already, and will open up a lot of TODO items across a lot of >>>>> drivers. E.g. we could add a drm_dev_kzalloc, for allocating all the drm_* >>>>> structs, which gets released together with drm_device. I think that's a >>>>> much clearer path forward, I think we all agree that getting the kfree out >>>>> of the driver codes is a good thing, and it would allow us to do this >>>>> correctly. >>>>> >>>>> Then once we have that and rolled out to a few drivers we can reconsider >>>>> the entire unregister/shutdown gordian knot here. Atm I just have no idea >>>>> how to do this properly :-/ >>>>> >>>>> Thoughts, other ideas? >>>>> >>>> >>>> Yeah, I've come to the conclusion that devm_drm_dev_register() doesn't >>>> make much sense if we need a driver remove callback anyways. >>> >>> Yup, that's what I meant with the above: >>> - merge devm_drm_dev_register, use it a lot. This is definitely a good >>> idea. >>> - create a drm_dev_kzalloc, which automatically releases memory on the >>> final drm_dev_put. Use it every in drivers for drm structures, >>> especially in those drivers that currently use devm (which releases >>> those allocations potentialy too early on unplug). >>> - figure out the next steps after that >>> >>>> I think devm_drm_dev_init() makes sense because it yields a cleaner >>>> probe() function. An additional benefit is that it requires a >>>> drm_driver->release function which is a step in the right direction to >>>> get the drm_device lifetime right. >>>> >>>> Do we agree that a drm_dev_set_unplugged() function is necessary to get >>>> the remove/disconnect order right? >>>> >>>> What about drm_dev_unplug() maybe I should just leave it be? >>>> >>>> - amd uses drm_driver->unload, so that one takes some work to get right >>>> to support unplug. It doesn't check the unplugged state, so really >>>> doesn't need drm_dev_unplug() I guess. Do they have cards that can be >>>> hotplugged? >>> >>> Yeah amd still uses ->load and ->unload, which is not great unfortunately. >>> I just stumbled over that when I tried to make a series to disable the >>> global drm_global_mutex for most drivers ... >>> >>>> - udl uses drm_driver->unload, doesn't use drm_atomic_helper_shutdown(). >>>> It has only one drm_dev_is_unplugged() check and that is in its >>>> fbdev->open callback. >>> >>> udl isn't atomic, so can't use the atomic helpers. pre-atomic doesn't have >>> refcounting issues when something is left on iirc. I think udl is written >>> under the assumption that ->unload is always called for an unplug, never >>> for an actual unload. >>> >>>> - xen calls drm_atomic_helper_shutdown() in its drm_driver->release >>>> callback which I guess is not correct. >>> >>> Yeah this smells fishy. ->release is supposed to be for cleaning up kernel >>> structures, not for cleaning up the hw. So maybe drm_mode_config_cleanup >>> could be put there, not sure honestly. But definitely not _shutdown(). >>> >>>> This is what it will look like with a set unplugged function: >>>> >>>> void drm_dev_unplug(struct drm_device *dev) >>>> { >>>> drm_dev_set_unplugged(dev); >>>> drm_dev_unregister(dev); >>>> drm_dev_put(dev); >>>> } >>>> >>>> Hm, I should probably remove it to avoid further use of it since it's >>>> not correct for a modern driver. >>> >>> I think something flew over my head ... what's the drm_dev_set_unplugged >>> for? I think I'm not following you ... >>> >> >> Taking it a few steps back: >> >> This patch proposes to move 'dev->unplugged = true' from >> drm_dev_unplug() to drm_dev_unregister(). >> >> Additionally I proposed this change to the drm_dev_unregister() docs: >> >> * This should be called first in the device teardown code to make sure >> - * userspace can't access the device instance any more. >> + * userspace can't access the device instance any more. Drivers that >> support >> + * device unplug will probably want to call >> drm_atomic_helper_shutdown() first >> + * in order to disable the hardware on regular driver module unload. >> >> Which would give a driver remove callback like this: >> >> static int driver_remove() >> { >> drm_atomic_helper_shutdown(); >> drm_dev_unregister(); >> } >> >> Your reaction was that drm_atomic_helper_shutdown() needs to be called >> after drm_dev_unregister() to avoid a race resulting in leaked objects. >> However if we call it afterwards, ->unplugged will be true and the >> driver can't touch hardware. >> >> Then I proposed moving the unplugged state setting to: >> >> void drm_dev_set_unplugged(struct drm_device *dev) >> { >> dev->unplugged = true; >> synchronize_srcu(&drm_unplug_srcu); >> } >> >> Now it is possible to avoid the race and still touch hardware: >> >> static int driver_remove() >> { >> drm_dev_unregister(); >> drm_atomic_helper_shutdown(); >> drm_dev_set_unplugged(); >> } >> >> But now I'm back to the question: Is it the driver or the device that is >> going away? >> >> If it's the driver we are fine touching hardware, if it's the device it >> depends on how we access the device resource and whether the subsystem >> has protection in place to handle access after the device is gone. I >> think USB can handle and block device access up until the disconnect >> callback has finished (no point in doing so though, since the normal >> operation is that the device is gone, not the driver unloading). >> >> Is there a way to determine who's going away without changes to the >> device core? >> >> Maybe. The driver can only be unregistered if there are no open file >> handles because a ref is taken on the driver module. > > This isn't true. You can (not many people do, but it's possible) to unbind > through sysfs. The module reference only keeps the cpu instructions valid, > nothing else. > >> So maybe something along these lines: >> >> int drm_dev_open_count(struct drm_device *dev) >> { >> int open_count; >> >> mutex_lock(&drm_global_mutex); >> open_count = dev->open_count; >> mutex_unlock(&drm_global_mutex); > > Random style nit: READ_ONCE, no locking needed (the locks don't change > anything, except if you have really strange locking rules). Serves > double-duty as a huge warning sign that something tricky is happening. > >> return open_count; >> } >> >> static int driver_remove() >> { >> drm_dev_unregister(); >> >> open_count = drm_dev_open_count(); >> >> /* Open fd's, device is going away, block access */ >> if (open_count) >> drm_dev_set_unplugged(); >> >> drm_atomic_helper_shutdown(); >> >> /* No open fd's, driver is going away */ >> if (!open_count) >> drm_dev_set_unplugged(); >> } >> >> >> Personally I have 2 use cases: >> - tinydrm SPI drivers >> The only hotpluggable SPI controllers I know of are USB which should >> handle device access while unregistering. >> SPI devices can be removed, but the controller driver is still in >> place so it's safe. >> - A future USB driver (that I hope to work on when all this core stuff >> is in place). >> There's no point in touching the hw, so DRM can be set uplugged right >> away in the disconnect() callback. >> >> This means that I don't need to know who's going away for my use cases. >> >> This turned out to be quite long winding, but the bottom line is that >> having a separate function to set the unplugged state seems to give a >> lot of flexibility for various use cases. >> >> I hope I didn't muddy the waters even more :-) > > Hm, I think I get your idea. Since I'm still not sure what the best option > is I'm leaning towards just leaving stuff as-is. I.e. drivers which want > to shut down hw can do the 1. drm_dev_unregister() 2. > drm_atomic_helper_shutdown() sequence. Drivers which care about hotunplug > more can do just the drm_dev_unplug(). > > Yes it's messy and unsatisfactory, but in case of serious doubt I like to > wait until maybe in the future we have a good idea. Meanwhile leaving a > bit of a mess around is better than charging ahead in a possibly totally > wrong direction. > > There's cases where clue still hasn't hit me, even years later (or anyone > else). That's just how it is sometimes. > Ok, I'll drop this. This means that I'll drop devm_drm_dev_init() as well since it doesn't play well with drm_dev_unplug() since both will do drm_dev_put(). Not a big deal really, it just means that I need to add one error path in the probe function so I can drm_dev_put() on error. Are you still ok with the first bug fix patch in this series? > Zooming out more looking at the big picture I'd say all your work in the > past few years has enormously simplified drm for simple drivers already. > If we can't resolve this one here right now that just means you "only" > made drm 98% simpler instead of maybe 99%. It's still an epic win :-) > Thanks, your mentoring and reviews helped turn my rough ideas into something useful :-) Noralf. > Cheers, Daniel > > >> Noralf. >> >>> Thanks, Daniel >>> >>>> >>>> Noralf. >>>> >>>>> Cheers, Daniel >>>>> >>>>>> Noralf. >>>>>> >>>>>>>> + * in order to disable the hardware on regular driver module unload. >>>>>>>> */ >>>>>>>> void drm_dev_unregister(struct drm_device *dev) >>>>>>>> { >>>>>>>> @@ -845,6 +839,15 @@ void drm_dev_unregister(struct drm_device *dev) >>>>>>>> if (drm_core_check_feature(dev, DRIVER_LEGACY)) >>>>>>>> drm_lastclose(dev); >>>>>>>> >>>>>>>> + /* >>>>>>>> + * After synchronizing any critical read section is guaranteed to see >>>>>>>> + * the new value of ->unplugged, and any critical section which might >>>>>>>> + * still have seen the old value of ->unplugged is guaranteed to have >>>>>>>> + * finished. >>>>>>>> + */ >>>>>>>> + dev->unplugged = true; >>>>>>>> + synchronize_srcu(&drm_unplug_srcu); >>>>>>>> + >>>>>>>> dev->registered = false; >>>>>>>> >>>>>>>> drm_client_dev_unregister(dev); >>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h >>>>>>>> index ca46a45a9cce..c50696c82a42 100644 >>>>>>>> --- a/include/drm/drm_drv.h >>>>>>>> +++ b/include/drm/drm_drv.h >>>>>>>> @@ -736,13 +736,11 @@ void drm_dev_unplug(struct drm_device *dev); >>>>>>>> * drm_dev_is_unplugged - is a DRM device unplugged >>>>>>>> * @dev: DRM device >>>>>>>> * >>>>>>>> - * This function can be called to check whether a hotpluggable is unplugged. >>>>>>>> - * Unplugging itself is singalled through drm_dev_unplug(). If a device is >>>>>>>> - * unplugged, these two functions guarantee that any store before calling >>>>>>>> - * drm_dev_unplug() is visible to callers of this function after it completes >>>>>>>> + * This function can be called to check whether @dev is unregistered. This can >>>>>>>> + * be used to detect that the underlying parent device is gone. >>>>>>> >>>>>>> I think it'd be good to keep the first part, and just update the reference >>>>>>> to drm_dev_unregister. So: >>>>>>> >>>>>>> * This function can be called to check whether a hotpluggable is unplugged. >>>>>>> * Unplugging itself is singalled through drm_dev_unregister(). If a device is >>>>>>> * unplugged, these two functions guarantee that any store before calling >>>>>>> * drm_dev_unregister() is visible to callers of this function after it >>>>>>> * completes. >>>>>>> >>>>>>> I think your version shrugs a few important details under the rug. With >>>>>>> those nits addressed: >>>>>>> >>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> >>>>>>> >>>>>>> Cheers, Daniel >>>>>>> >>>>>>>> * >>>>>>>> - * WARNING: This function fundamentally races against drm_dev_unplug(). It is >>>>>>>> - * recommended that drivers instead use the underlying drm_dev_enter() and >>>>>>>> + * WARNING: This function fundamentally races against drm_dev_unregister(). It >>>>>>>> + * is recommended that drivers instead use the underlying drm_dev_enter() and >>>>>>>> * drm_dev_exit() function pairs. >>>>>>>> */ >>>>>>>> static inline bool drm_dev_is_unplugged(struct drm_device *dev) >>>>>>>> -- >>>>>>>> 2.20.1 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Intel-gfx mailing list >>>>>>>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx >>>>>>> >>>>> >>> > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx