On Tue, 2024-02-20 at 16:37 -0800, Saravana Kannan wrote: > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <herve.codina@xxxxxxxxxxx> wrote: > > > > In the following sequence: > > 1) of_platform_depopulate() > > 2) of_overlay_remove() > > > > During the step 1, devices are destroyed and devlinks are removed. > > During the step 2, OF nodes are destroyed but > > __of_changeset_entry_destroy() can raise warnings related to missing > > of_node_put(): > > ERROR: memory leak, expected refcount 1 instead of 2 ... > > > > Indeed, during the devlink removals performed at step 1, the removal > > itself releasing the device (and the attached of_node) is done by a job > > queued in a workqueue and so, it is done asynchronously with respect to > > function calls. > > When the warning is present, of_node_put() will be called but wrongly > > too late from the workqueue job. > > > > In order to be sure that any ongoing devlink removals are done before > > the of_node destruction, synchronize the of_overlay_remove() with the > > devlink removals. > > > > Add Fixes tag for this one too to point to the change that added the workqueue. > > Please CC Nuno and Luca in your v2 series. > > > Signed-off-by: Herve Codina <herve.codina@xxxxxxxxxxx> > > --- > > drivers/of/overlay.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > index a9a292d6d59b..5c5f808b163e 100644 > > --- a/drivers/of/overlay.c > > +++ b/drivers/of/overlay.c > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > goto out; > > } > > > > + /* > > + * Wait for any ongoing device link removals before removing some of > > + * nodes > > + */ > > + device_link_wait_removal(); > > + > > Nuno in his patch[1] had this "wait" happen inside > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > the issue that Luca reported[2] in this patch series. Is there any > problem with doing that? > In my tests, I did not saw any issue. Logically it also makes sense as you wait for all possible refcount drops just before checking your assumptions. But I might be missing something though... > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > I don't think that's necessary. Yes, I agree. queue_work() and flush_worqueue() should already have all the synchronization semantics internally. - Nuno Sá