On Tue, Jan 15, 2019 at 11:47 PM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > > [CC Greg] > > On Tuesday, January 15, 2019 1:04:02 AM CET Rafael J. Wysocki wrote: > > [CC Lukas and linux-pm] > > > > On Mon, Jan 14, 2019 at 1:32 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > > > On Fri, Jan 11, 2019 at 3:49 PM Russell King - ARM Linux > > > <linux@xxxxxxxxxxxxxxx> wrote: > > > > [cut] > > > > > > > > > > This thread is discussing how to deal with Armada DRM, its use of the > > > > component helper, TDA998x's hybrid use of the component helper and > > > > DRM bridges. > > > > > > > > Currently, DRM bridges register into the DRM system and are added to > > > > a global list of bridges. When a DRM driver binds, it looks up the > > > > DRM bridge and attaches to it. There is no way in generic DRM code > > > > for the DRM driver to know if the DRM bridge is unbound from DRM, > > > > consequently a DRM driver may continue trying to call functions in > > > > the DRM bridge driver using memory that has already been freed. > > > > > > > > We had similar issues with imx-drm, and a number of years ago at a > > > > kernel summit, this was discussed, and I proposed a system which is > > > > now known as the component helper. This handles the problem of a > > > > multi-component system. > > > > > > > > However, DRM bridge was already established, and there appears to be > > > > no desire to convert DRM bridges and DRM drivers to use the component > > > > helper. > > > > > > > > We are presently in the situation where Armada DRM is incompatible > > > > with the DRM bridges way of doing things, and making it compatible > > > > essentially means introducing potential use-after-free bugs into the > > > > code. > > > > > > > > Device links in their stateful form appear to provide an alternative > > > > to the component helper, in that a stateful device link will remove > > > > consumers of a resource when the supplier is going away - which is > > > > exactly the problem which the component helper is solving. The > > > > difference is that device links look like being a cleaner solution. > > > > > > > > Just like the component helper, a stateful link would unbind the > > > > consumer of a resource when the supplier goes away - which is exactly > > > > the behaviour we're wanting. > > > > > > > > The problem is that the connection between various drivers is only > > > > really known when the drivers obtain their resources, and the > > > > following can happen: > > > > > > > > supplier consumer > > > > > > > > probe() > > > > alloc > > > > probe() > > > > publish > > > > obtain supplier's resource > > > > return > > > > > > > > Where things go wrong is if a stateful link is created when the > > > > consumer obtains the suppliers resource before the supplier has > > > > finished probing - which from what's been said is illegal. > > > > > > It just doesn't work (which means "invalid" rather than "illegal" I > > > guess, but whatever :-)). > > > > > > Admittedly, the original design didn't take this particular case into > > > account and I'm not actually sure how it can be taken into account > > > either. > > > > > > If the link is created by the consumer in the scenario above, its > > > status will be updated twice in a row after the consumer probe returns > > > and after the supplier probe returns. It looks like this update would > > > need to work regardless of the order it which this happened which > > > sounds somewhat challenging. I would need to think about that. > > > > So if I'm not mistaken it can be made work with the help of the > > (completely untested) attached patch (of course, the documentation > > would need to be updated too). > > > > The key observation here is that it should be fine to create a link > > from the consumer driver's probe while the supplier is still probing > > if the consumer has some way to find out that the supplier is > > functional at that point (like when it has published itself already in > > your example above). The initial state of the link can be "consumer > > probe" in that case and I don't see a reason why that might not work. > > > > Below is a more complete patch that should take all of the corner cases > into account unless I have missed anything. Testing it would be > appreciated. :-) > > --- > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > Subject: [PATCH] driver core: Fix adding device links to probing suppliers > > Currently, it is not valid to add a device link from a consumer > driver ->probe() callback to a supplier that is still probing too, but > generally this is a valid use case. For example, if the consumer has > just acquired a resource that can only be available when the supplier > is functional, adding a device link to that supplier right away > should be safe (and even desirable arguably), but device_link_add() > doesn't handle that case correctly and the initial state of the link > created by it is wrong then. > > To address this problem, change the initial state of device links > added between a probing supplier and a probing consumer to > DL_STATE_CONSUMER_PROBE and update device_links_driver_bound() to > skip such links on the supplier side. > > With this change, if the supplier probe completes first, > device_links_driver_bound() called for it will skip the link state > update and when it is called for the consumer, the link state will > be updated to "active". In turn, if the consumer probe completes > first, device_links_driver_bound() called for it will change the > state of the link to "active" and when it is called for the > supplier, the link status update will be skipped. > > However, in principle the supplier or consumer probe may still fail > after the link has been added, so modify device_links_no_driver() to > change device links in the "active" or "consumer probe" state to > "dormant" on the supplier side and update __device_links_no_driver() > to change the link state to "available" only if it is "consumer > probe" or "active". > > Then, if the supplier probe fails first, the leftover link to the > probing consumer will become "dormant" and device_links_no_driver() > called for the consumer (when its probe fails) will clean it up. > In turn, if the consumer probe fails first, it will either drop the > link, or change its state to "available" and, in the latter case, > when device_links_no_driver() is called for the supplier, it will > update the link state to "dormant". [If the supplier probe fails, > but the consumer probe succeeds, which should not happen as long as > the consumer driver is correct, the link still will be around, but > it will be "dormant" until the supplier is probed again.] > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> Pulling in a bunch of drm_bridge and drm_panel people. See huge discussion upthread, this here is Rafael's idea for making device links more useful. Would be great if someone could test this out with panel/bridge dependencies. I guess this here isn't yet solving the reprobe issue where the provider was unloaded and reloaded (which should cause the consumer to reprobe too, with the EPROBE_DEFERRED logic). I think that was the other issue we've hit. -Daniel > --- > Documentation/driver-api/device_link.rst | 10 ++-- > drivers/base/core.c | 74 +++++++++++++++++++++++++++---- > 2 files changed, 73 insertions(+), 11 deletions(-) > > Index: linux-pm/drivers/base/core.c > =================================================================== > --- linux-pm.orig/drivers/base/core.c > +++ linux-pm/drivers/base/core.c > @@ -260,17 +260,26 @@ struct device_link *device_link_add(stru > link->status = DL_STATE_NONE; > } else { > switch (supplier->links.status) { > - case DL_DEV_DRIVER_BOUND: > + case DL_DEV_PROBING: > switch (consumer->links.status) { > case DL_DEV_PROBING: > /* > - * Some callers expect the link creation during > - * consumer driver probe to resume the supplier > - * even without DL_FLAG_RPM_ACTIVE. > + * A consumer driver can create a link to a > + * supplier that has not completed its probing > + * yet as long as it knows that the supplier is > + * already functional (for example, it has just > + * acquired some resources from the supplier). > */ > - if (flags & DL_FLAG_PM_RUNTIME) > - pm_runtime_resume(supplier); > - > + link->status = DL_STATE_CONSUMER_PROBE; > + break; > + default: > + link->status = DL_STATE_DORMANT; > + break; > + } > + break; > + case DL_DEV_DRIVER_BOUND: > + switch (consumer->links.status) { > + case DL_DEV_PROBING: > link->status = DL_STATE_CONSUMER_PROBE; > break; > case DL_DEV_DRIVER_BOUND: > @@ -291,6 +300,14 @@ struct device_link *device_link_add(stru > } > > /* > + * Some callers expect the link creation during consumer driver probe to > + * resume the supplier even without DL_FLAG_RPM_ACTIVE. > + */ > + if (link->status == DL_STATE_CONSUMER_PROBE && > + flags & DL_FLAG_PM_RUNTIME) > + pm_runtime_resume(supplier); > + > + /* > * Move the consumer and all of the devices depending on it to the end > * of dpm_list and the devices_kset list. > * > @@ -474,6 +491,16 @@ void device_links_driver_bound(struct de > if (link->flags & DL_FLAG_STATELESS) > continue; > > + /* > + * Links created during consumer probe may be in the "consumer > + * probe" state to start with if the supplier is still probing > + * when they are created and they may become "active" if the > + * consumer probe returns first. Skip them here. > + */ > + if (link->status == DL_STATE_CONSUMER_PROBE || > + link->status == DL_STATE_ACTIVE) > + continue; > + > WARN_ON(link->status != DL_STATE_DORMANT); > WRITE_ONCE(link->status, DL_STATE_AVAILABLE); > } > @@ -513,17 +540,48 @@ static void __device_links_no_driver(str > > if (link->flags & DL_FLAG_AUTOREMOVE_CONSUMER) > kref_put(&link->kref, __device_link_del); > - else if (link->status != DL_STATE_SUPPLIER_UNBIND) > + else if (link->status == DL_STATE_CONSUMER_PROBE || > + link->status == DL_STATE_ACTIVE) > WRITE_ONCE(link->status, DL_STATE_AVAILABLE); > } > > dev->links.status = DL_DEV_NO_DRIVER; > } > > +/** > + * device_links_no_driver - Update links after failing driver probe. > + * @dev: Device whose driver has just failed to probe. > + * > + * Clean up leftover links to consumers for @dev and invoke > + * %__device_links_no_driver() to update links to suppliers for it as > + * appropriate. > + * > + * Links with the DL_FLAG_STATELESS flag set are ignored. > + */ > void device_links_no_driver(struct device *dev) > { > + struct device_link *link; > + > device_links_write_lock(); > + > + list_for_each_entry(link, &dev->links.consumers, s_node) { > + if (link->flags & DL_FLAG_STATELESS) > + continue; > + > + /* > + * The probe has failed, so if the status of the link is > + * "consumer probe" or "active", it must have been added by > + * a probing consumer while this device was still probing. > + * Change its state to "dormant", as it represents a valid > + * relationship, but it is not functionally meaningful. > + */ > + if (link->status == DL_STATE_CONSUMER_PROBE || > + link->status == DL_STATE_ACTIVE) > + WRITE_ONCE(link->status, DL_STATE_DORMANT); > + } > + > __device_links_no_driver(dev); > + > device_links_write_unlock(); > } > > Index: linux-pm/Documentation/driver-api/device_link.rst > =================================================================== > --- linux-pm.orig/Documentation/driver-api/device_link.rst > +++ linux-pm/Documentation/driver-api/device_link.rst > @@ -59,11 +59,15 @@ device ``->probe`` callback or a boot-ti > > Another example for an inconsistent state would be a device link that > represents a driver presence dependency, yet is added from the consumer's > -``->probe`` callback while the supplier hasn't probed yet: Had the driver > -core known about the device link earlier, it wouldn't have probed the > +``->probe`` callback while the supplier hasn't started to probe yet: Had the > +driver core known about the device link earlier, it wouldn't have probed the > consumer in the first place. The onus is thus on the consumer to check > presence of the supplier after adding the link, and defer probing on > -non-presence. > +non-presence. [Note that it is valid to create a link from the consumer's > +``->probe`` callback while the supplier is still probing, but the consumer must > +know that the supplier is functional already at the link creation time (that is > +the case, for instance, if the consumer has just acquired some resources that > +would not have been available had the supplier not been functional then).] > > If a device link is added in the ``->probe`` callback of the supplier or > consumer driver, it is typically deleted in its ``->remove`` callback for > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel