Quick summary: I have investigated a problem for a long time, I have a pretty good understanding, I tried various fixes but none except this is working. The goal of this patch is to discuss the problem to converge to the best solution. --------------------------------- Symptoms --------------------------------- The problem appeared while testing the v3 addon connector driver that is part of this series, and which is based on device tree overlays. Note the symptom happens often, but not always. Changes to logging is a typical way to make it appear/disappear, so it appears as time sensitive. The relevant DT overlay snippet is: / { fragment@0 { target-path = ""; __overlay__ { devices { // nodes in here are populated as platform devices reg_addon_3v3_lcd: regulator-addon-3v3-lcd { compatible = "regulator-fixed"; regulator-name = "3V3_LCD_ADDON"; gpios = <...>; }; addon_panel_dsi_lvds: panel-dsi-lvds { compatible = "..."; power-supply = <®_addon_3v3_lcd>; }; }; }; }; }; So the regulator is a supplier to the panel. Nothing special here, except we are in an overlay. The overlay gets applied and all devices work correctly. Troubles start appearing in the form of two messages on overlay removal, in this order: * WARNING: CPU: 1 PID: 189 at drivers/regulator/core.c:5856 regulator_unregister+0x1ec/0x208 This is issued during removal of the 3V3_LCD_ADDON regulator because rdev->open_count is 1, while it should be 0. This is because the panel still hasn't closed the regulator. * Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: regulator_lock_recursive+0x5c/0x200 regulator_lock_dependent+0xc0/0x140 regulator_enable+0x44/0x98 panel_simple_resume+0x38/0x108 [panel_simple] pm_generic_runtime_resume+0x34/0x58 __rpm_callback+0x50/0x1f0 rpm_callback+0x70/0x88 rpm_resume+0x49c/0x678 __pm_runtime_resume+0x54/0xa0 device_release_driver_internal+0xd4/0x240 device_release_driver+0x20/0x38 bus_remove_device+0xd4/0x120 device_del+0x154/0x388 This happens while the panel driver is being removed and the devm infra tries to close the regulator which is already gone. Both errors have the same origin: the regulator driver is removed before the panel driver. --------------------------------- Problem analysis --------------------------------- My analysis showed that the problem originates from devlink manipulation during overlay insertion, but shows its effects on removal. This is the sequence of events: * During overlay insertion: 1. the devlink code creates a devlink (A) with: supplier = regulator-addon-3v3-lcd consumer = panel-dsi-lvds flags = DL_FLAG_INFERRED (+ possibly others) because it has been inferred from firmware data 2. soon after, devlink A is relaxed and then dropped - does not happen always, based on timing - see below for details 3. the regulator-addon-3v3-lcd regulator gets probed as a platform device - the probe function for regulator-addon-3v3-lcd is reg_fixed_voltage_probe(), which calls devm_regulator_register() to register a single new regulator class device for the voltage output - regulator_register() does, among others: - instantiate a new regulator class device (3V3_LCD_ADDON), with parent = regulator-addon-3v3-lcd - adds a devlink (B) with: supplier = 3V3_LCD_ADDON consumer = panel-dsi-lvds At this point we have these devices and devlinks: .---------------------------. | regulator-addon-3v3-lcd | | regulator platform device | supplier consumer | (struct device) |<--------- devlink A -------------. '---------------------------' (inferred) | ^ V | .-----------------. | parent | panel-dsi-lvds | | | (struct device) | | '-----------------' .---------------------------. ^ | 3V3_LCD_ADDON | supplier consumer | | regulator class device |<--------- devlink B -------------' | (struct device) | (created by '---------------------------' regulator core) Depending on whether step 2 happens or not, devlink A will be still present or not during overlay removal. When step 2 happens and devlink A gets dropped (which happens to me almost always), the removal code calls: -> device_release_driver(dev = regulator-addon-3v3-lcd) -> device_release_driver_internal() -> __device_release_driver() -> if (device_links_busy()) // see below { device_links_unbind_consumers(dev = regulator-addon-3v3-lcd) -> for each consumer for which 'dev' is a supplier: { device_release_driver_internal(dev = panel-dsi-lvds) } } The logic is pretty clear: before removing a device that is a supplier to other devices (regulator-addon-3v3-lcd), use devlink to find all consumers (panel-dsi-lvds) and remove them first, recursively. However in case devlink A had been initially dropped, there is no devlink between the two devices. The regulator removal will just proceed, and the regulator device gets removed before its consumer. Note devlink B is not at all examined within this removal phase. device_links_busy() looks at the platform device (regulator-addon-3v3-lcd), and it has no way to know about the class device (3V3_LCD_ADDON). Assuming the whole device_links_busy() / device_links_unbind_consumers() logic is correct, let's move to why devlink A gets dropped. --------------------------------- Why the devlink is dropped --------------------------------- It all starts in the device_add() for regulator-addon-3v3-lcd: /* * If all driver registration is done and a newly added device doesn't * match with any driver, don't block its consumers from probing in * case the consumer device is able to operate without this supplier. */ if (dev->fwnode && fw_devlink_drv_reg_done && !dev->can_match) fw_devlink_unblock_consumers(dev); The three conditions in the if() mean: 1. this device comes from firmware -> always true in my case (device tree) 2. this global flag is set via the deferred_probe_timeout_work as soon as for 10 consecutive seconds there is no new driver being probed; it is never cleared 3. no driver has been matched with this device so far (IOW the probe function of the driver for this device has never been called, regardless of the return value) If all condtions apply, fw_devlink_unblock_consumers() will (after some checks) call fw_devlink_relax_link() on every link to consumers and "relax" it. Relaxing means setting link flags to DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE. Soon later, device_links_driver_bound() will take devlinks with these flags and drop them. I was unable to understand in full detail the flag manipulation logic happening in the devlink code. However I think the high-level logic here can be expressed as: if a devlink was inferred from firmware and its supplier device did not probe after a while (*) because no potential driver was found, then maybe that devlink was wrong or it is enforcing a supplier that is optional for the consumer: let's drop the link and see whether the (formerly devlink consumer) device can now probe. (*) "after a while" is implemented by the fw_devlink_drv_reg_done flag, which typically gets set way less than a minute after boot Basically the fw_devlink_drv_reg_done flag splits the probing in two phases. In phase 1 we try to probe all inferred suppliers before probing consumers. Then we set fw_devlink_drv_reg_done and relax+drop the "dangling" inferred devlinks. Then in phase 2 we try to probe without inferred devlinks. This is to see if we can probe more devices due to incorrectly inferred devlinks or missing drivers for optional suppliers. Overlays however can be loaded at any time, even a long time after booting. This is totally normal when used for a hotplug connector, where the devices get physically connected by the user. This implies the fw_devlink_drv_reg_done flag is found already set when probing overlay devices. And so, conditions 1 and 2 above are always set in the overlay case. So we are left with condition 3 only. Again I haven't done a full analysis here, but it is perfectly fine that a driver is not immediately present when adding a new device. It can just have not yet been matched, possibly because a driver module is in process of being loaded from storage. I think there is a race here: on one side the driver becoming available and matched to the device to be probed, on the other side the fw_devlink_unblock_consumers() logic to relax and drop inferred devlinks. If the device probes first, the link won't be dropped. --------------------------------- Same problem without DT overlays? --------------------------------- Based on the above, I suspect the exact same problem exists even without any overlay usage. Indeed, the conditions might exist in other corner cases. One example is a device whose driver is a module and is not loaded during boot: e.g. it is not on the root filesystem, it is being developed and the programmer sends the driver via SSH to a tmpfs to load and test it. As said, this is a matter of corner cases, but still possible. Note that no problem should happen to natively removable devices such as USB, because condition 1 defuses the whole if() above for devices not described in firmware. --------------------------------- Fixes I have tried (not working) --------------------------------- I tried a couple approaches based on devlink to fix this issue. One was augmenting the regulator core code to add a new devlink between the regulator platform device (regulator-addon-3v3-lcd) and the regulator class device (3V3_LCD_ADDON), to create a chain for device_links_busy() to follow. The devlink is created and persists until removal time. However it does not enforce the correct ordering: device_links_busy() ignores it because the link status is always "dormant". The reason appears to be that the "regulator output device" is a struct device but never has a driver. Recently Saravana pointed out that: > device links don't work correctly for "class" type devices (https://lore.kernel.org/all/CAGETcx-ioF=jTbyQMeD2fsYKz8q5vw_TWYWS9m8H5=pCo5KFYA@xxxxxxxxxxxxxx/) which is possibly related. I tried a variant: change the devlink already created by _regulator_get() to use the regulator platform device (regulator-addon-3v3-lcd) instead of the regulator class device (3V3_LCD_ADDON) as the supplier. That would make devlink B have the same endpoints as devlink A. However this did not work due to the link state staying "not tracked" and thus again being ignored by device_links_busy(). I haven't managed to find out the flag manipulations that would make it work. --------------------------------- Conclusions --------------------------------- The current logic is clearly OK for "typical" current use cases (not counting corner cases), but unsuitable for hotplugged devices described by firmware. The question is: do we have an underlying assumption that was valid so far but is wrong when overlays are added? One possible answer is: dropping inferred devlinks is wrong. Generally speaking, inferred devlinks are sometimes useless but don't hurt, so there is no need to drop them. This is what this patch changes. However I realize there is a use case for dropping inferred devlink: optional suppliers that prevent consumer probing until they are dropped. Indeed, inferring devlinks from firmware data can create links that prevent some device to probe. For this reason my first attempts have been to add or change the devlinks that subsystem code creates. So a more sophisticated idea is that after phase 1 we try to probe all not-probed-yet consumers ignoring the relaxed devlinks, instead of removing them. This would allow the same amount of devices to be probed using the same amount of optional suppliers, but leaving the inferred devlinks in place because they might be useful later on. And then of course there are the above solutions I failed to get working, which might be the right way but need some directions for me to have them working. I am very open to more answers and suggestions. Best regards, Luca Signed-off-by: Luca Ceresoli <luca.ceresoli@xxxxxxxxxxx> --- Changes in v4: - fix typos and update device tree names in commit message This patch first appeared in v3. --- drivers/base/core.c | 21 --------------------- 1 file changed, 21 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 82cc4069a24a..32413dd330b1 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1876,19 +1876,6 @@ void __init wait_for_init_devices_probe(void) fw_devlink_best_effort = false; } -static void fw_devlink_unblock_consumers(struct device *dev) -{ - struct device_link *link; - - if (!fw_devlink_flags || fw_devlink_is_permissive()) - return; - - device_links_write_lock(); - list_for_each_entry(link, &dev->links.consumers, s_node) - fw_devlink_relax_link(link); - device_links_write_unlock(); -} - #define get_dev_from_fwnode(fwnode) get_device((fwnode)->dev) static bool fwnode_init_without_drv(struct fwnode_handle *fwnode) @@ -3682,14 +3669,6 @@ int device_add(struct device *dev) bus_probe_device(dev); - /* - * If all driver registration is done and a newly added device doesn't - * match with any driver, don't block its consumers from probing in - * case the consumer device is able to operate without this supplier. - */ - if (dev->fwnode && fw_devlink_drv_reg_done && !dev->can_match) - fw_devlink_unblock_consumers(dev); - if (parent) klist_add_tail(&dev->p->knode_parent, &parent->p->klist_children); -- 2.34.1