On Tue, 2024-02-27 at 20:13 +0100, Rafael J. Wysocki wrote: > On Tue, Feb 27, 2024 at 8:08 PM Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > > Hi Herve, > > > > On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote: > > > Hi Nuno, > > > > > > On Tue, 27 Feb 2024 17:55:07 +0100 > > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > > > > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote: > > > > > Hi Saravana, Luca, Nuno, > > > > > > > > > > On Tue, 20 Feb 2024 16:37:05 -0800 > > > > > Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > > > > > > > index a9a292d6d59b..5c5f808b163e 100644 > > > > > > > --- a/drivers/of/overlay.c > > > > > > > +++ b/drivers/of/overlay.c > > > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id) > > > > > > > goto out; > > > > > > > } > > > > > > > > > > > > > > + /* > > > > > > > + * Wait for any ongoing device link removals before removing > > > > > > > some > > > > > > > of > > > > > > > + * nodes > > > > > > > + */ > > > > > > > + device_link_wait_removal(); > > > > > > > + > > > > > > > > > > > > Nuno in his patch[1] had this "wait" happen inside > > > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit > > > > > > the issue that Luca reported[2] in this patch series. Is there any > > > > > > problem with doing that? > > > > > > > > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and > > > > > > I don't think that's necessary. > > > > > > > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed. > > > > > > > > > > I do the device_link_wait_removal() wihout having the of_mutex locked. > > > > > > > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex > > > > > locked. > > > > > The following flow is allowed and a deadlock is present. > > > > > > > > > > of_overlay_remove() > > > > > lock(of_mutex) > > > > > device_link_wait_removal() > > > > > > > > > > And, from the workqueue jobs execution: > > > > > ... > > > > > device_put() > > > > > some_driver->remove() > > > > > of_overlay_remove() <--- The job will never end. > > > > > It is waiting for of_mutex. > > > > > Deadlock > > > > > > > > > > > > > We may need some input from Saravana (and others) on this. I might be missing > > > > something but can a put_device() lead into a driver remove callback? Driver > > > > code > > > > is > > > > not device code and put_device() leads to device_release() which will either > > > > call > > > > the > > > > device ->release(), ->type->release() or the class ->dev_release(). And, IMO, > > > > calling > > > > of_overlay_remove() or something like that (like something that would lead to > > > > unbinding a device from it's driver) in a device release callback would be at > > > > the > > > > very least very questionable. Typically, what you see in there is > > > > of_node_put() > > > > and > > > > things like kfree() of the device itself or any other data. > > > > > > I think that calling of_overlay_remove() in a device release callback makes > > > sense. The overlay is used to declare sub-nodes from the device node. It > > > does not add/remove the device node itself but sub-nodes. > > > > > > > I think we are speaking about two different things... device release is not the > > same > > as the driver remove callback. I admit the pci case seems to be a beast of it's > > own > > and I just spent some time (given your links) on it so I can't surely be sure > > about > > what I'm about to say... But, AFAICT, I did not saw any overlay or changeset > > being > > removed from a kobj_type release callback. > > > > > The use case is the use of DT overlays to describe PCI devices. > > > https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@xxxxxxx/ > > > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@xxxxxxxxxxx/ > > > --- 8< --- > > > The lan966x SoCs can be used in two different ways: > > > > > > - It can run Linux by itself, on ARM64 cores included in the SoC. This > > > use-case of the lan966x is currently being upstreamed, using a > > > traditional Device Tree representation of the lan996x HW blocks [1] > > > A number of drivers for the different IPs of the SoC have already > > > been merged in upstream Linux. > > > > > > - It can be used as a PCIe endpoint, connected to a separate platform > > > that acts as the PCIe root complex. In this case, all the devices > > > that are embedded on this SoC are exposed through PCIe BARs and the > > > ARM64 cores of the SoC are not used. Since this is a PCIe card, it > > > can be plugged on any platform, of any architecture supporting PCIe. > > > --- 8< --- > > > > > > This quite long story led to DT overlay support for PCI devices and so the > > > unittest I mentioned: > > > https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946 > > > > > > > > > So, I have a PCI driver that bind to the lan966x PCI board. > > > This driver loads an overlay at probe() and unload it at remove(). > > > Also, this driver can be module. A simple rmmod leads to the remove() call. > > > > > > > Hmm, and I think that would not be an issue... Note that the code that runs in > > device_link_release_fn() is doing put_device() which ends ups on the kobj_type > > release callback and so far I could not see any evidence of such a callback being > > responsible of calling device_remove() on another device. That would be weird (I > > think) since I would expect such a call to happen in a kind of unregister > > function. > > > > > This driver is not yet upstream because I haven't yet fixed all the issues I > > > encountered that's why of now, I can point only the unittest related to overlay > > > support for PCI. > > > > > > > > > > > The driver remove callback should be called when unbinding the device from > > > > it's > > > > drivers and devlinks should also be removed after device_unbind_cleanup() > > > > (i.e, > > > > after > > > > the driver remove callback). > > > > > > > > Having said the above, the driver core has lots of subtleties so, again, I > > > > can be > > > > missing something. But at this point I'm still not seeing any deadlock... > > > > > > > > > > I gave a wrong example. > > > Based on Luca's sequence he gave in > > > https://lore.kernel.org/all/20231220181627.341e8789@booty/ > > > > Regarding Luca's comments, my first approach was actually to just make the > > devlink > > removal synchronously... I'm still not sure what would be the issue of doing that > > (other than potentially waiting some time for the srcu synchronization). > > It would allow forward progress to be made, but it would add potential > delay for everybody, which is only really needed in the DT overlay > case. Sounds like something to avoid to me. I mean, we could surely detect (or have a way to do it) if the devlink is being removed as part of an overlay removal and only call device_link_release_fn() synchronously in that case. It would minimize the effects I guess. But yeah, we still need to make sure there's an actual deadlock... I'm still not seeing it but Herve may very well be correct about it. > > > About the unlock, I'm just not sure what could happen if someone else (other than > > us) sneaks in > > and grabs the of_mutex while we are in the middle of removing an overlay... > > If that is possible at all, I'd call it a bug. I think, in theory, it could happen as it looks to be a fairly coarse grained lock for OF: https://elixir.bootlin.com/linux/latest/source/drivers/of/of_private.h#L40 - Nuno Sá