On 9/9/2021 3:16 PM, Heikki Krogerus wrote: > On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote: >> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote: >>> >>> >>> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote: >>>> >>>> >>>> On 7/20/2021 1:27 PM, Andy Shevchenko wrote: >>>>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor >>>>> <laurentiu.tudor@xxxxxxx> wrote: >>>>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote: >>>>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote: >>>>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote: >>>>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko >>>>>>>>> <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@xxxxxxx wrote: >>>>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@xxxxxxx> >>>>>>>>>>> >>>>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed >>>>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by >>>>>>>>>>> bumping it in the device_create_managed_software_node() function. >>>>>>>>>>> >>>>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our >>>>>>>>>>> fsl-mc-bus driver. >>>>>>>>>> >>>>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted >>>>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown() >>>>>>>>>> method itself. >>>>>>>>> >>>>>>>>> Isn't the other alternative to just remove the second kobject_put from >>>>>>>>> KOBJ_REMOVE ? >>>>>>>> >>>>>>>> Or maybe on top of Heikki's suggestion, replace the calls to >>>>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()? >>>>>>> >>>>>>> _noearn will hide the problem. It was there, it was removed from there. >>>>>>> Perhaps we have to understand the root cause better (some specific flow?). >>>>>>> >>>>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened >>>>>>> on the big picture that we got into the warning you see? >>>>>> >>>>>> I encountered the initial issue when trying to shut down a system booted >>>>>> with ACPI but only after adding a .shutdown() callback to our bus driver >>>>>> so that the devices are properly taken down. The problem was that >>>>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count >>>>>> twice leading to an underflow error. My initial proposal was to just >>>>>> bump the refcount in device_create_managed_software_node(). The device >>>>>> properties that triggered the problem are created here [1]. >>>>>> >>>>>> Heikko suggested that instead of manually incrementing the refcount to >>>>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a >>>>>> duplicated sysfs entry warning originating in the usb subsystem: >>>>>> device_create_managed_software_node() ends up being called twice, once >>>>>> here [2] and secondly, the place I previous mentioned [1]. >>>>> >>>>> This [3] is what I have reported against DWC3 when ->shutdown() has >>>>> been added there. And here [4] is another thread about the issue with >>>>> that callback. The ->release() callback is called at put_device() [5] >>>>> and ->shutdown() is called before that [6]. That said, can you inspect >>>>> your ->shutdown() implementation once more time and perhaps see if >>>>> there is anything that can be amended? >>>>> >>>> >>>> Will do, thanks for the pointers. It could be that we mess something out >>>> in how we use the driver model. >>>> >>> >>> Quick (and late, sorry) update from my side. I've spent time on >>> debugging our bus, did found some issues but, at least for now, none are >>> related to sw node. >>> In the mean time, I noticed in the swnode code that >>> device_add_software_node() calls software_node_notify(KOBJ_ADD) while >>> device_create_managed_software_node() doesn't. Updating [1] the later >>> with the call to software_node_notify(KOBJ_ADD) does seem to fix the >>> issue I'm seeing. >>> >>> Could this be a problem? Any comments appreciated. >>> >>> One more thing perhaps worth mentioning is that, at least for now, there >>> are few uses for this device_create_managed_software_node() api, >>> mentioning here a couple of them: >>> - arm64 iort code - this seems to be triggering the issue i'm getting >>> - dwc3 usb - Andy reported similar issues here, maybe the issue is common? >>> >>> [1] >>> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct >>> device *dev, >>> to_swnode(fwnode)->managed = true; >>> set_secondary_fwnode(dev, fwnode); >>> >>> + /* >>> + * If the device has been fully registered by the time this >>> function is >>> + * called, software_node_notify() must be called separately so >>> that the >>> + * symlinks get created and the reference count of the node is >>> kept in >>> + * balance. >>> + */ >>> + if (device_is_registered(dev)) >>> + software_node_notify(dev, KOBJ_ADD); >>> + >>> return 0; >>> } >> >> That should be fixed indeed. Please send that after -rc1 is out. > > I mean, resend :-) > Right, actually I just noticed that this is the fix you suggested last time we discussed. :-) I forgot about it, sorry. There's still the WARN_ON() [1] triggered by the usb subsys, apparently happening only (!) in ACPI boot scenario, so +Lorenzo. I'll delay the sending a bit to try to understand what's going on. [1] [ 11.760346] sysfs: cannot create duplicate filename '/devices/platform/808622B7:01/xhci-hcd.2.auto/software_node' [ 11.770612] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc1-00214-gbf7f1083ebd3-dirty #62 [ 11.780611] Hardware name: NXP NXP LX2160ARDB Platform, BIOS EDK II Apr 16 2021 [ 11.787913] Call trace: [ 11.790351] dump_backtrace+0x0/0x2a4 [ 11.794017] show_stack+0x1c/0x30 [ 11.797331] dump_stack_lvl+0x68/0x84 [ 11.800991] dump_stack+0x20/0x3c [ 11.804302] sysfs_warn_dup+0x88/0xac [ 11.807965] sysfs_do_create_link_sd+0xf8/0x100 [ 11.812492] sysfs_create_link+0x48/0x80 [ 11.816411] software_node_notify+0x1a8/0x35c [ 11.820769] device_create_managed_software_node+0x158/0x1b0 [ 11.826428] iort_named_component_init+0xe0/0x140 [ 11.831131] iort_iommu_configure_id+0xf4/0x270 [ 11.835660] acpi_dma_configure_id+0x160/0x200 [ 11.840101] platform_dma_configure+0xa0/0xa4 [ 11.844457] really_probe.part.0+0x84/0x480 [ 11.848639] __driver_probe_device+0xd4/0x180 [ 11.852994] driver_probe_device+0xf8/0x1e0 [ 11.857174] __driver_attach+0x108/0x220 [ 11.861095] bus_for_each_dev+0xe4/0x154 [ 11.865014] driver_attach+0x38/0x50 [ 11.868587] bus_add_driver+0x1bc/0x2c4 [ 11.872419] driver_register+0xf0/0x210 [ 11.876253] __platform_driver_register+0x48/0x60 [ 11.880956] xhci_plat_init+0x34/0x40 [ 11.884616] do_one_initcall+0xa8/0x270 [ 11.888449] kernel_init_freeable+0x2c0/0x348 [ 11.892806] kernel_init+0x28/0x140 [ 11.896295] ret_from_fork+0x10/0x18 [ 11.900062] xhci-hcd xhci-hcd.2.auto: Adding to iommu group 6 [ 11.906044] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller [ 11.911566] xhci-hcd xhci-hcd.2.auto: new USB bus registered, assigned bus number 3 [ 11.919702] xhci-hcd xhci-hcd.2.auto: hcc params 0x0220f66d hci version 0x100 quirks 0x0000000000010010 [ 11.929187] xhci-hcd xhci-hcd.2.auto: irq 106, io mem 0x03110000 --- Best Regards, Laurentiu