Hi Saravana, On Wed, Feb 8, 2023 at 9:35 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > On Tue, Feb 7, 2023 at 11:57 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > > > > The driver core now: > > > > > > - Has the parent device of a supplier pick up the consumers if the > > > > > > supplier never has a device created for it. > > > > > > - Ignores a supplier if the supplier has no parent device and will never > > > > > > be probed by a driver > > > > > > > > > > > > And already prevents creating a device link with the consumer as a > > > > > > supplier of a parent. > > > > > > > > > > > > So, we no longer need to find the "compatible" node of the supplier or > > > > > > do any other checks in of_link_to_phandle(). We simply need to make sure > > > > > > that the supplier is available in DT. > > > > > > > > > > > > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx> > > > > > > > > > > Thanks for your patch! > > > > > > > > > > This patch introduces a regression when dynamically loading DT overlays. > > > > > Unfortunately this happens when using the out-of-tree OF configfs, > > > > > which is not supported upstream. Still, there may be (obscure) > > > > > in-tree users. > > > > > > > > > > When loading a DT overlay[1] to enable an SPI controller, and > > > > > instantiate a connected SPI EEPROM: > > > > [...] > > > > > > > The SPI controller and the SPI EEPROM are no longer instantiated. > > > > > > Sigh... I spent way too long trying to figure out if I caused a memory > > > > leak. I should have scrolled down further! Doesn't look like that part > > > > is related to anything I did. > > > > > > > > There are some flags set to avoid re-parsing fwnodes multiple times. > > > > My guess is that the issue you are seeing has to do with how many of > > > > the in memory structs are reused vs not when an overlay is > > > > applied/removed and some of these flags might not be getting cleared > > > > and this is having a bigger impact with this patch (because the fwnode > > > > links are no longer anchored on "compatible" nodes). > > > > > > > > With/without this patch (let's keep the series) can you look at how > > > > the following things change between each step you do above (add, > > > > remove, retry): > > > > 1) List of directories under /sys/class/devlink > > > > 2) Enable the debug logs inside __fwnode_link_add(), > > > > __fwnode_link_del(), device_link_add() > > > > > > > > My guess is that the final solution would entail clearing > > > > FWNODE_FLAG_LINKS_ADDED for some fwnodes. > > > > > > You replied just as I was about to hit send. So sending this anyway... > > > > > > Ok, I took a closer look and I think it's a bit of a mess. The fact > > > that it even worked for you without this patch is a bit of a > > > coincidence. > > > > > > Let's just take platform devices that are created by > > > driver/of/platform.c as an example. > > > > > > The main problem is that when you add/remove properties to a DT node > > > of an existing platform device, nothing is really done about it at the > > > device level. We don't even unbind and rebind the driver so the driver > > > could make use of the new properties. We don't remove and add back the > > > device so whoever might use the new property will use it. And if you > > > are adding a new node, it'll only trigger any platform device level > > > impact if it's a new node of a "simple-bus" (or similar bus) device. > > > > > > Problem 1: > > > So if you add a new child node to an existing probed device that adds > > > its children explicitly (as in, the parent is not a "simple-bus" like > > > device), nothing will happen. The newly added child device node will > > > get converted into a platform device, not will the parent device > > > notice it. So in your case of adding msiof0_pins, it's just that when > > > the consumer gets the pins, the driver doesn't get involved much and > > > it's the pinctrl framework that reads the DT and figures it out. > > > > > > With this patch, the fwnode links point to the actual resource and the > > > actual parent device inherits them if they don't get converted to a > > > struct device. But since we are adding this msiof0_pins after the > > > parent device has probed, the fwnode link isn't inherited by the > > > parent pinctrl device. > > > > > > Problem 2: > > > So if you add a property to an already bound device, nothing is done > > > by the driver. In your overlay example, if you move the status="okay" > > > line to be the first property you change in the msiof0 spi device, > > > you'll probably see that fw_devlink is no longer the one blocking the > > > probe. This is because the platform device will get added as soon as > > > the status flips from disabled to enabled and at that point fw_devlink > > > will think it has no suppliers and won't do any probe deferring. And > > > then as the new properties get added nothing will happen at the device > > > or fw_devlink level. If the msiof0's spi driver fails immediately with > > > NOT -EPROBE_DEFER when platform device is added because it couldn't > > > find any pinctrl property, then msiof0 will never probe (unless you > > > remove and add the driver). If it had failed with -EPROBE_DEFER, then > > > it might probe again if something else triggers a deferred probe > > > attempt. Clearly, things working/not working based on the order of > > > properties in DT is not a good implementation. > > > > > > Problem 3: > > > What if you enable a previously disabled supplier. There's no way to > > > handle that from a fw_devlink level without re-parsing the entire > > > device tree because existing devices might be consumers now. > > > > > > Anyway, long story short, it's sorta worked due to coincidence and > > > it's quite messy to get it to work correctly. > > > > Several subsystems register notifiers to be informed of the events > > above. E.g. drivers/spi/spi.c: > > > > if (IS_ENABLED(CONFIG_OF_DYNAMIC)) > > WARN_ON(of_reconfig_notifier_register(&spi_of_notifier)); > > if (IS_ENABLED(CONFIG_ACPI)) > > WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier)); > > > > So my issue might be triggered using ACPI, too. > > Yeah, I did notice this before my email. Here's an ugly hack (at end > of email) to test my theory about Problem 1. I didn't compile test it > (because I should go to bed now), but you get the idea. Can you give > this a shot? It should fix your specific case. Basically for all > overlays (I hope the function is only used for overlays) we assume all > nodes are NOT devices until they actually get added as a device. Don't > review the code, it's not meant to be :) > > -Saravana > > --- a/drivers/of/dynamic.c > +++ b/drivers/of/dynamic.c > @@ -226,6 +226,7 @@ static void __of_attach_node(struct device_node *np) > np->sibling = np->parent->child; > np->parent->child = np; > of_node_clear_flag(np, OF_DETACHED); > + np->fwnode.flags |= FWNODE_FLAG_NOT_DEVICE; > } > > /** > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index 81c8c227ab6b..7299cd668e51 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -732,6 +732,7 @@ static int of_platform_notify(struct notifier_block *nb, > if (of_node_check_flag(rd->dn, OF_POPULATED)) > return NOTIFY_OK; > > + rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE; > /* pdev_parent may be NULL when no bus platform device */ > pdev_parent = of_find_device_by_node(rd->dn->parent); > pdev = of_platform_device_create(rd->dn, NULL, > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 15f174f4e056..1de55561b25d 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -4436,6 +4436,7 @@ static int of_spi_notify(struct notifier_block > *nb, unsigned long action, > return NOTIFY_OK; > } > > + rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE; > spi = of_register_spi_device(ctlr, rd->dn); > put_device(&ctlr->dev); Thanks, these changes fix my SPI EEPROM in a DT overlay. A similar change should be applied to the i2c bus core (and to other users of of_reconfig_notifier_register()?). For reference, the same debug output and /sys/class/devlink changes with this fix applied can be found below. Note that there are still a few remaining issues, for which I do not know the full impact: - platform:e6060000.pinctrl--platform:keys link is not recreated on overlay remove, - There is no change in /sys/class/devlink after an add/remove/add cycle. Shouldn't removing a DT overlay restore /sys/class/devlink to the exact same state as before adding the DT overlay? With extra FWNODE_FLAG_NOT_DEVICE handling: - Adding overlay: spi@e6e90000 Linked as a fwnode consumer to interrupt-controller@f1010000 spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000 spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000 spi@e6e90000 Linked as a fwnode consumer to msiof0 spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000 platform e6e90000.spi: Linked as a consumer to e6055000.gpio spi@e6e90000 Dropping the fwnode link to gpio@e6055000 platform e6e90000.spi: Linked as a consumer to e6060000.pinctrl spi@e6e90000 Dropping the fwnode link to msiof0 spi@e6e90000 Dropping the fwnode link to system-controller@e6180000 platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000 platform e6e90000.spi: Linked as a consumer to soc spi@e6e90000 Dropping the fwnode link to interrupt-controller@f1010000 +platform:e6055000.gpio--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi +platform:e6060000.pinctrl--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi +platform:e6150000.clock-controller--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi +platform:soc--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi -platform:e6060000.pinctrl--platform:keys -> ../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys SPI EEPROM works - Removing overlay: platform keys: Linked as a sync state only consumer to e6055000.gpio -platform:e6055000.gpio--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi -platform:e6060000.pinctrl--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi -platform:e6150000.clock-controller--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi -platform:soc--platform:e6e90000.spi -> ../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi platform:e6060000.pinctrl--platform:keys link is not recreated?!?!? - Adding overlay again: No debug output No change in sys/class/devlink?!?!? SPI EEPROM works Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds