> Date: Tue, 23 Jul 2019 17:10:55 -0700 > Subject: [PATCH v7 2/7] driver core: Add edit_links() callback for drivers > From: Saravana Kannan <saravanak@xxxxxxxxxx> > > The driver core/bus adding supplier-consumer dependencies by default > enables functional dependencies to be tracked correctly even when the > consumer devices haven't had their drivers registered or loaded (if they > are modules). enables functional dependencies to be tracked correctly before the consumer device drivers are registered or loaded (if they are modules). > > However, when the bus incorrectly adds dependencies that it shouldn't ^^^ driver core/bus > have added, the devices might never probe. Explain what causes a dependency to be incorrectly added. Is this a bug in the dependency detection code? Are there cases where the dependency detection code can not reliably determine whether there truly is a dependency? > > For example, if device-C is a consumer of device-S and they have > phandles to each other in DT, the following could happen: > > 1. Device-S get added first. > 2. The bus add_links() callback will (incorrectly) try to link it as > a consumer of device-C. > 3. Since device-C isn't present, device-S will be put in > "waiting-for-supplier" list. > 4. Device-C gets added next. > 5. All devices in "waiting-for-supplier" list are retried for linking. > 6. Device-S gets linked as consumer to Device-C. > 7. The bus add_links() callback will (correctly) try to link it as > a consumer of device-S. > 8. This isn't allowed because it would create a cyclic device links. > > Neither devices will get probed since the supplier is marked as > dependent on the consumer. And the consumer will never probe because the > consumer can't get resources from the supplier. > > Without this patch, things stay in this broken state. However, with this > patch, the execution will continue like this: > > 9. Device-C's driver is loaded. Change comment to: For example, if device-C is a consumer of device-S and they have phandles referencing each other in the devicetree, the following could happen: 1. Device-S is added first. - The bus add_links() callback will (incorrectly) link device-S as a consumer of device-C, and device-S will be put in the "wait_for_suppliers" list. 2. Device-C is added next. - All devices in the "wait_for_suppliers" list are retried for linking. - Device-S remains linked as a consumer to device-C. - The bus add_links() callback will (correctly) try to link device-C as a consumer of device-S. - The link attempt will fail because it would create a cyclic device link, and device-C will be put in the "wait_for_suppliers" list. Device-S will not be probed because it is in the "wait_for_suppliers" list. Device-C will not be probed because it is in the "wait_for_suppliers" list. > > Without this patch, things stay in this broken state. However, with this > patch, the execution will continue like this: > > 9. Device-C's driver is loaded. What is "loaded"? Does that mean the device-C probe succeeds? What causes device-C to be probed? The normal processing of -EPROBE_DEFER devices? > 10. Device-C's driver removes Device-S as a consumer of Device-C. > 11. Device-C's driver adds Device-C as a consumer of Device-S. > 12. Device-S probes. > 14. Device-C probes. > > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx> > --- > drivers/base/core.c | 24 ++++++++++++++++++++++-- > drivers/base/dd.c | 29 +++++++++++++++++++++++++++++ > include/linux/device.h | 18 ++++++++++++++++++ > 3 files changed, 69 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 1b4eb221968f..733d8a9aec76 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -422,6 +422,19 @@ static void device_link_wait_for_supplier(struct device *consumer) > mutex_unlock(&wfs_lock); > } > > +/** > + * device_link_remove_from_wfs - Unmark device as waiting for supplier > + * @consumer: Consumer device > + * > + * Unmark the consumer device as waiting for suppliers to become available. > + */ > +void device_link_remove_from_wfs(struct device *consumer) Misleading function name. Incorrect description. Does not remove consumer from list wait_for_suppliers. At best, consumer might eventually get removed from list wait_for_suppliers if device_link_check_waiting_consumers() is called again. > +{ > + mutex_lock(&wfs_lock); > + list_del_init(&consumer->links.needs_suppliers); > + mutex_unlock(&wfs_lock); > +} > + > /** > * device_link_check_waiting_consumers - Try to unmark waiting consumers > * > @@ -439,12 +452,19 @@ static void device_link_wait_for_supplier(struct device *consumer) > static void device_link_check_waiting_consumers(void) > { > struct device *dev, *tmp; > + int ret; > > mutex_lock(&wfs_lock); > list_for_each_entry_safe(dev, tmp, &wait_for_suppliers, > - links.needs_suppliers) > - if (!dev->bus->add_links(dev)) > + links.needs_suppliers) { > + ret = 0; > + if (dev->has_edit_links) > + ret = driver_edit_links(dev); > + else if (dev->bus->add_links) > + ret = dev->bus->add_links(dev); > + if (!ret) > list_del_init(&dev->links.needs_suppliers); > + } > mutex_unlock(&wfs_lock); > } > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 994a90747420..5e7041ede0d7 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -698,6 +698,12 @@ int driver_probe_device(struct device_driver *drv, struct device *dev) > pr_debug("bus: '%s': %s: matched device %s with driver %s\n", > drv->bus->name, __func__, dev_name(dev), drv->name); > > + if (drv->edit_links) { > + if (drv->edit_links(dev)) > + dev->has_edit_links = true; > + else > + device_link_remove_from_wfs(dev); > + } For the purposes of the following paragraphs, I refer to dev as "dev_1" to distinguish it from a new dev that will be encountered later. The following paragraphs assume dev_1 has a supplier dependency for a supplier that has not probed yet. Q. Why the extra level of indirection? A. really_probe() does not set dev->driver before returning if device_links_check_suppliers() returned -EPROBE_DEFER. Thus device_link_check_waiting_consumers() can not directly check "if (dev_1->driver->edit_links)". The added driver_probe_device() code is setting dev_1->has_edit_links in the probe path, then device_link_check_waiting_consumers() will use the value of dev_1->has_edit_links instead of directly checking "if (dev_1->driver->edit_links)". If really_probe() was modified to set dev->driver in this case then the need for dev->has_edit_links is removed and driver_edit_links() is not needed, since dev->driver would be available. Removing driver_edit_links() simplifies the code. device_add() calls dev_1->bus->add_links(dev_1), thus dev_1 will have the supplier links set (for any suppliers not currently available) and be on list wait_for_suppliers. Then device_add() calls bus_probe_device(), leading to calling driver_probe_device(). The above code fragment either sets dev_1->has_edit_links or removes the needs_suppliers links from dev_1. dev_1 remains on list wait_for_suppliers. If (drv->edit_links(dev_1) returns 0 then device_link_remove_from_wfs() removes the supplier links. Shouldn't device_link_remove_from_wfs() also remove the device from the list wait_for_suppliers? The next time a device is added, device_link_check_waiting_consumers() will be called and dev_1 will be on list wait_for_suppliers, thus device_link_check_waiting_consumers() will find dev_1->has_edit_links true and thus call driver_edit_links() instead of calling dev->bus->add_links(). The comment in device.h, later in this patch, says that drv->edit_links() is responsible for editing the device links for dev. The comment provides no guidance on how drv->edit_links() is supposed to determine what edits to perform. No example drv->edit_links() function is provided in this patch series. dev_1->bus->add_links(dev_1) may have added one or more suppliers to its needs_suppliers link. drv->edit_links() needs to be able to handle all possible variants of what suppliers are on the needs_suppliers link. > pm_runtime_get_suppliers(dev); > if (dev->parent) > pm_runtime_get_sync(dev->parent); > @@ -786,6 +792,29 @@ struct device_attach_data { > bool have_async; > }; > > +static int __driver_edit_links(struct device_driver *drv, void *data) > +{ > + struct device *dev = data; > + > + if (!drv->edit_links) > + return 0; > + > + if (driver_match_device(drv, dev) <= 0) > + return 0; > + > + return drv->edit_links(dev); > +} > + > +int driver_edit_links(struct device *dev) > +{ > + int ret; > + > + device_lock(dev); > + ret = bus_for_each_drv(dev->bus, NULL, dev, __driver_edit_links); > + device_unlock(dev); > + return ret; > +} > + > static int __device_attach_driver(struct device_driver *drv, void *_data) > { > struct device_attach_data *data = _data; > diff --git a/include/linux/device.h b/include/linux/device.h > index 5d70babb7462..35aed50033c4 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -263,6 +263,20 @@ enum probe_type { > * @probe_type: Type of the probe (synchronous or asynchronous) to use. > * @of_match_table: The open firmware table. > * @acpi_match_table: The ACPI match table. > + * @edit_links: Called to allow a matched driver to edit the device links the Where is the value of field edit_links set? Is it only in an out of tree driver? If so, I would like to see an example implementation of the edit_links() function. > + * bus might have added incorrectly. This will be useful to handle > + * cases where the bus incorrectly adds functional dependencies > + * that aren't true or tries to create cyclic dependencies. But > + * doesn't correctly handle functional dependencies that are > + * missed by the bus as the supplier's sync_state might get to > + * execute before the driver for a missing consumer is loaded and > + * gets to edit the device links for the consumer. > + * > + * This function might be called multiple times after a new device > + * is added. The function is expected to create all the device > + * links for the new device and return 0 if it was completed > + * successfully or return an error if it needs to be reattempted > + * in the future. > * @probe: Called to query the existence of a specific device, > * whether this driver can work with it, and bind the driver > * to a specific device. > @@ -302,6 +316,7 @@ struct device_driver { > const struct of_device_id *of_match_table; > const struct acpi_device_id *acpi_match_table; > > + int (*edit_links)(struct device *dev); > int (*probe) (struct device *dev); > int (*remove) (struct device *dev); > void (*shutdown) (struct device *dev); > @@ -1078,6 +1093,7 @@ struct device { > bool offline_disabled:1; > bool offline:1; > bool of_node_reused:1; > + bool has_edit_links:1; Add has_edit_links to the struct's kernel_doc > #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ > defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ > defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) > @@ -1329,6 +1345,7 @@ extern int __must_check device_attach(struct device *dev); > extern int __must_check driver_attach(struct device_driver *drv); > extern void device_initial_probe(struct device *dev); > extern int __must_check device_reprobe(struct device *dev); > +extern int driver_edit_links(struct device *dev); > > extern bool device_is_bound(struct device *dev); > > @@ -1419,6 +1436,7 @@ struct device_link *device_link_add(struct device *consumer, > struct device *supplier, u32 flags); > void device_link_del(struct device_link *link); > void device_link_remove(void *consumer, struct device *supplier); > +void device_link_remove_from_wfs(struct device *consumer); > > #ifndef dev_fmt > #define dev_fmt(fmt) fmt > -- > 2.22.0.709.g102302147b-goog > >