On Wed, Mar 6, 2024 at 7:56 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Wed, Mar 6, 2024 at 4:24 PM Herve Codina <herve.codina@xxxxxxxxxxx> wrote: > > > > Hi Rafael, > > > > On Wed, 6 Mar 2024 13:48:37 +0100 > > "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote: > > > > > On Wed, Mar 6, 2024 at 9:51 AM Herve Codina <herve.codina@xxxxxxxxxxx> wrote: > > > > > > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") > > > > introduces a workqueue to release the consumer and supplier devices used > > > > in the devlink. > > > > In the job queued, devices are release and in turn, when all the > > > > references to these devices are dropped, the release function of the > > > > device itself is called. > > > > > > > > Nothing is present to provide some synchronisation with this workqueue > > > > in order to ensure that all ongoing releasing operations are done and > > > > so, some other operations can be started safely. > > > > > > > > For instance, in the following sequence: > > > > 1) of_platform_depopulate() > > > > 2) of_overlay_remove() > > > > > > > > During the step 1, devices are released and related devlinks are removed > > > > (jobs pushed in the workqueue). > > > > During the step 2, OF nodes are destroyed but, without any > > > > synchronisation with devlink removal jobs, of_overlay_remove() can raise > > > > warnings related to missing of_node_put(): > > > > ERROR: memory leak, expected refcount 1 instead of 2 > > > > > > > > Indeed, the missing of_node_put() call is going to be done, too late, > > > > from the workqueue job execution. > > > > > > > > Introduce device_link_wait_removal() to offer a way to synchronize > > > > operations waiting for the end of devlink removals (i.e. end of > > > > workqueue jobs). > > > > Also, as a flushing operation is done on the workqueue, the workqueue > > > > used is moved from a system-wide workqueue to a local one. > > > > > > > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") > > > > > > No, it is not fixed by this patch. > > > > Was explicitly asked by Saravana on v1 review: > > https://lore.kernel.org/linux-kernel/CAGETcx9uP86EHyKJNifBMd23oCsA+KpMa+e36wJEEnHDve+Avg@xxxxxxxxxxxxxx/ > > Well, I don't think this is a valid request, sorry. > > > The commit 80dd33cf72d1 introduces the workqueue and so some asynchronous tasks > > on removal. > > This patch and the next one allows to re-sync execution waiting for jobs in > > the workqueue when it is needed. > > I get this, but still, this particular individual patch by itself > doesn't fix anything. Do you agree with this? > > If somebody applies this patch without the next one in the series, > they will not get any change in behavior, so the tag is at least > misleading. > > You can claim that the next patch on top of this one fixes things, so > adding a Fixes tag to the other patch would be fine. > > There is an explicit dependency between them (the second patch is not > even applicable without the first one, or if it is, the resulting code > won't compile anyway), but you can make a note to the maintainer that > they need to go to -stable together. > > > > > > > In fact, the only possibly observable effect of this patch is the > > > failure when the allocation of device_link_wq fails AFAICS. > > > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > > So why? > > > > Cc:stable is needed as this patch is a prerequisite of patch 2 (needed > > to fix the asynchronous workqueue task issue). > > Dependencies like this can be expressed in "Cc: stable" tags. > Personally, I'd do it like this: > > Cc: stable@xxxxxxxxxxxxxxx # 5.13: Depends on the first patch in the series I'm okay with this too. I personally think it's better to list "Fixes: xyz" in all the patches that are needed to fix xyz (especially when there's no compile time dependency on earlier patches), but it's not a hill I'll die on. And if Rafael's suggestion is the expected norm, then I'll remember to follow that in the future. -Saravana