Hi, On Wed, 20 Dec 2023 18:16:27 +0100 Luca Ceresoli <luca.ceresoli@xxxxxxxxxxx> wrote: > Hello Saravana, Rob, Hervé, > > [+Miquèl, who contributed to the discussion with Hervé and me] > > On Wed, 6 Dec 2023 19:09:06 -0800 > Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > On Wed, Dec 6, 2023 at 9:15 AM Rob Herring <robh@xxxxxxxxxx> wrote: > > > > > > On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote: > > > > Hi, > > > > > > +Saravana for comment > > > > I'll respond to this within a week -- very swamped at the moment. The > > main thing I want to make sure is that we don't cause an indirect > > deadlock with this wait(). I'll go back and look at why we added the > > work queue and then check for device/devlink locking issues. > > While working on a project unrelated to Hervé's work, I also ended up > in getting sporadic but frequent "ERROR: memory leak, expected refcount > 1 instead of..." messages, which persisted even after adding this patch > series on my tree. > > My use case is the insertion and removal of a simple overlay describing > a regulator-fixed and an I2C GPIO expander using it. The messages appear > regardless of whether the insertion and removal is done from kernel code > or via the configfs interface (out-of-tree patches from [0]). > > I reconstructed the sequence of operations, all of which stem from > of_overlay_remove(): > > int of_overlay_remove(int *ovcs_id) > { > ... > > device_link_wait_removal(); // proposed by this patch series > > mutex_lock(&of_mutex); > > ... > > ret = __of_changeset_revert_notify(&ovcs->cset); > // this ends up calling (excerpt from a long stack trace): > // -> of_i2c_notify > // -> device_remove > // -> devm_regulator_release > // -> device_link_remove > // -> devlink_dev_release, which queues work for > // device_link_release_fn, which in turn calls: > // -> device_put > // -> device_release > // -> {platform,regulator,...}_dev*_release > // -> of_node_put() [**] > > ... > > free_overlay_changeset(ovcs); > // calls: > // -> of_changeset_destroy > // -> __of_changeset_entry_destroy > // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d... > // The error appears or not, based on when the workqueue runs > > err_unlock: > mutex_unlock(&of_mutex); > > ... > } > > So this adds up to the question of whether devlink removal should actually > be run asynchronously or not. > > A simple short-term solution is to move the call to > device_link_wait_removal() later, just before free_overlay_changeset(): Indeed, during of_overlay_remove() notifications can be done and in Luca's use-case, they lead to some device removals and so devlink removals. That's why we move the synchronization calling device_link_wait_removal() after notifications and so just before free_overlay_changeset(). > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 1a8a6620748c..eccf08cf2160 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -1375,12 +1375,6 @@ int of_overlay_remove(int *ovcs_id) > goto out; > } > > - /* > - * Wait for any ongoing device link removals before removing some of > - * nodes > - */ > - device_link_wait_removal(); > - > mutex_lock(&of_mutex); > > ovcs = idr_find(&ovcs_idr, *ovcs_id); > @@ -1427,6 +1421,14 @@ int of_overlay_remove(int *ovcs_id) > if (!ret) > ret = ret_tmp; > > + /* > + * Wait for any ongoing device link removals before removing some of > + * nodes > + */ > + mutex_unlock(&of_mutex); > + device_link_wait_removal(); > + mutex_lock(&of_mutex); > + > free_overlay_changeset(ovcs); > > err_unlock: > > > This obviously raises the question of whether unlocking and re-locking > the mutex is potentially dangerous. I have no answer to this right away, > but I tested this change with CONFIG_PROVE_LOCKING=y and no issue showed > up after several overlay load/unload sequences so I am not aware of any > actual issues with this change. > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/log/?h=topic/overlays > > Luca Thanks Luca for this complementary use-case related to this issue. Hervé -- Hervé Codina, Bootlin Embedded Linux and Kernel engineering https://bootlin.com