Hi Heikki, On 9/10/2021 3:38 PM, Heikki Krogerus wrote: > On Fri, Sep 10, 2021 at 03:05:16PM +0300, Laurentiu Tudor wrote: >> >> >> On 9/9/2021 5:01 PM, Laurentiu Tudor wrote: >>> >>> >>> On 9/9/2021 3:16 PM, Heikki Krogerus wrote: >>>> On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote: >>>>> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote: >>>>>> >>>>>> >>>>>> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote: >>>>>>> >>>>>>> >>>>>>> On 7/20/2021 1:27 PM, Andy Shevchenko wrote: >>>>>>>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor >>>>>>>> <laurentiu.tudor@xxxxxxx> wrote: >>>>>>>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote: >>>>>>>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote: >>>>>>>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote: >>>>>>>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko >>>>>>>>>>>> <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@xxxxxxx wrote: >>>>>>>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@xxxxxxx> >>>>>>>>>>>>>> >>>>>>>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed >>>>>>>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by >>>>>>>>>>>>>> bumping it in the device_create_managed_software_node() function. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our >>>>>>>>>>>>>> fsl-mc-bus driver. >>>>>>>>>>>>> >>>>>>>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted >>>>>>>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown() >>>>>>>>>>>>> method itself. >>>>>>>>>>>> >>>>>>>>>>>> Isn't the other alternative to just remove the second kobject_put from >>>>>>>>>>>> KOBJ_REMOVE ? >>>>>>>>>>> >>>>>>>>>>> Or maybe on top of Heikki's suggestion, replace the calls to >>>>>>>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()? >>>>>>>>>> >>>>>>>>>> _noearn will hide the problem. It was there, it was removed from there. >>>>>>>>>> Perhaps we have to understand the root cause better (some specific flow?). >>>>>>>>>> >>>>>>>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened >>>>>>>>>> on the big picture that we got into the warning you see? >>>>>>>>> >>>>>>>>> I encountered the initial issue when trying to shut down a system booted >>>>>>>>> with ACPI but only after adding a .shutdown() callback to our bus driver >>>>>>>>> so that the devices are properly taken down. The problem was that >>>>>>>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count >>>>>>>>> twice leading to an underflow error. My initial proposal was to just >>>>>>>>> bump the refcount in device_create_managed_software_node(). The device >>>>>>>>> properties that triggered the problem are created here [1]. >>>>>>>>> >>>>>>>>> Heikko suggested that instead of manually incrementing the refcount to >>>>>>>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a >>>>>>>>> duplicated sysfs entry warning originating in the usb subsystem: >>>>>>>>> device_create_managed_software_node() ends up being called twice, once >>>>>>>>> here [2] and secondly, the place I previous mentioned [1]. >>>>>>>> >>>>>>>> This [3] is what I have reported against DWC3 when ->shutdown() has >>>>>>>> been added there. And here [4] is another thread about the issue with >>>>>>>> that callback. The ->release() callback is called at put_device() [5] >>>>>>>> and ->shutdown() is called before that [6]. That said, can you inspect >>>>>>>> your ->shutdown() implementation once more time and perhaps see if >>>>>>>> there is anything that can be amended? >>>>>>>> >>>>>>> >>>>>>> Will do, thanks for the pointers. It could be that we mess something out >>>>>>> in how we use the driver model. >>>>>>> >>>>>> >>>>>> Quick (and late, sorry) update from my side. I've spent time on >>>>>> debugging our bus, did found some issues but, at least for now, none are >>>>>> related to sw node. >>>>>> In the mean time, I noticed in the swnode code that >>>>>> device_add_software_node() calls software_node_notify(KOBJ_ADD) while >>>>>> device_create_managed_software_node() doesn't. Updating [1] the later >>>>>> with the call to software_node_notify(KOBJ_ADD) does seem to fix the >>>>>> issue I'm seeing. >>>>>> >>>>>> Could this be a problem? Any comments appreciated. >>>>>> >>>>>> One more thing perhaps worth mentioning is that, at least for now, there >>>>>> are few uses for this device_create_managed_software_node() api, >>>>>> mentioning here a couple of them: >>>>>> - arm64 iort code - this seems to be triggering the issue i'm getting >>>>>> - dwc3 usb - Andy reported similar issues here, maybe the issue is common? >>>>>> >>>>>> [1] >>>>>> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct >>>>>> device *dev, >>>>>> to_swnode(fwnode)->managed = true; >>>>>> set_secondary_fwnode(dev, fwnode); >>>>>> >>>>>> + /* >>>>>> + * If the device has been fully registered by the time this >>>>>> function is >>>>>> + * called, software_node_notify() must be called separately so >>>>>> that the >>>>>> + * symlinks get created and the reference count of the node is >>>>>> kept in >>>>>> + * balance. >>>>>> + */ >>>>>> + if (device_is_registered(dev)) >>>>>> + software_node_notify(dev, KOBJ_ADD); >>>>>> + >>>>>> return 0; >>>>>> } >>>>> >>>>> That should be fixed indeed. Please send that after -rc1 is out. >>>> >>>> I mean, resend :-) >>>> >>> >>> Right, actually I just noticed that this is the fix you suggested last >>> time we discussed. :-) I forgot about it, sorry. >>> There's still the WARN_ON() [1] triggered by the usb subsys, apparently >>> happening only (!) in ACPI boot scenario, so +Lorenzo. >>> I'll delay the sending a bit to try to understand what's going on. >> >> I've spent some time looking into this and it turns out that in the >> ACPI case, device_create_managed_software_node() ends up being called >> twice, first here [1] and after that, in the IORT code here [2]. With >> the proposed patch this causes software_node_notify(KOBJ_ADD) being >> called twice thus triggering the dup sysfs entry warning. >> Any comments / ideas welcomed. >> >> [1] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/host.c#L111 > > I think the problem here is that the secondary fwnode get's replaced > because the primary fwnode is shared. Can you test it with this, just > to see if the problem goes away: > > diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c > index f29a264635aa1..e4b40f8b8f242 100644 > --- a/drivers/usb/dwc3/host.c > +++ b/drivers/usb/dwc3/host.c > @@ -76,7 +76,6 @@ int dwc3_host_init(struct dwc3 *dwc) > } > > xhci->dev.parent = dwc->dev; > - ACPI_COMPANION_SET(&xhci->dev, ACPI_COMPANION(dwc->dev)); > > dwc->xhci = xhci; Thanks for looking into this! Yes, this does make the issue go away. --- Best Regards, Laurentiu