On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote: > On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote: > > In the following sequence: > > 1) of_platform_depopulate() > > 2) of_overlay_remove() > > > > During the step 1, devices are destroyed and devlinks are removed. > > During the step 2, OF nodes are destroyed but > > __of_changeset_entry_destroy() can raise warnings related to missing > > of_node_put(): > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > Indeed, during the devlink removals performed at step 1, the removal > > itself releasing the device (and the attached of_node) is done by a job > > queued in a workqueue and so, it is done asynchronously with respect to > > function calls. > > When the warning is present, of_node_put() will be called but wrongly > > too late from the workqueue job. > > > > In order to be sure that any ongoing devlink removals are done before > > the of_node destruction, synchronize the of_overlay_remove() with the > > devlink removals. > > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Herve Codina <herve.codina@xxxxxxxxxxx> > > --- > > drivers/of/overlay.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > index 2ae7e9d24a64..7a010a62b9d8 100644 > > --- a/drivers/of/overlay.c > > +++ b/drivers/of/overlay.c > > @@ -8,6 +8,7 @@ > > > > #define pr_fmt(fmt) "OF: overlay: " fmt > > > > +#include <linux/device.h> > > This is clearly up to the DT maintainers to decide but, IMHO, I would very much > prefer to see fwnode.h included in here rather than directly device.h (so yeah, > renaming the function to fwnode_*). IMO, the DT code should know almost nothing about fwnode because that's the layer above it. But then overlay stuff is kind of a layer above the core DT code too. > But yeah, I might be biased by own series :) > > > #include <linux/kernel.h> > > #include <linux/module.h> > > #include <linux/of.h> > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct > > overlay_changeset *ovcs) > > { > > int i; > > > > + /* > > + * Wait for any ongoing device link removals before removing some of > > + * nodes. Drop the global lock while waiting > > + */ > > + mutex_unlock(&of_mutex); > > + device_link_wait_removal(); > > + mutex_lock(&of_mutex); > > I'm still not convinced we need to drop the lock. What happens if someone else > grabs the lock while we are in device_link_wait_removal()? Can we guarantee that > we can't screw things badly? It is also just ugly because it's the callers of free_overlay_changeset() that hold the lock and now we're releasing it behind their back. As device_link_wait_removal() is called before we touch anything, can't it be called before we take the lock? And do we need to call it if applying the overlay fails? Rob