Re: [RFC PATCH 0/1] of: dynamic: allow freeing of_nodes after the overlay changeset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frank,

Sorry to bother again :)... See below

On Sun, 2023-05-14 at 16:20 -0500, Frank Rowand wrote:
> On 5/11/23 10:10, Nuno Sa wrote:
> > There are valid cases where we might get in here with an unexpected
> > refcount. One example are devlinks between suppliers <-> consumers.
> > When a link is created, it will grab a reference to both the supplier and
> > the consumer devices (which can, potentially, hold a reference to it's
> > of_node) and this reference is not synchronously dropped when unbinding the
> > device from the driver. Instead, a work item is queued (see
> > devlink_dev_release()). This async nature will make that
> > __of_changeset_entry_destroy() is reached with a refcount > 1. But,
> > eventually, all the refcounts are released and of_node_release() is
> > reached.
> > 
> > All the above will lead to leaks and the following dumps:
> > 
> > [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /fpga-axi/dummy_dev
> > [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vio
> > [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vdd-1-8
> > [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vref
> > 
> >  1297.168367] ------------[ cut here ]------------
> > [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25
> > kobject_get+0x9c/0xa0
> > [ 1297.180514] refcount_t: addition on 0; use-after-free.
> > [ 1297.185661] Modules linked in:
> > [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty
> > #5
> > [ 1297.195617] Hardware name: Xilinx Zynq Platform
> > [ 1297.200158] Workqueue: events_long device_link_release_fn
> > [ 1297.205600]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.210880]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.215983]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.220816]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.220857]  warn_slowpath_fmt from kobject_get+0x9c/0xa0
> > [ 1297.220892]  kobject_get from of_node_get+0x14/0x1c
> > [ 1297.236279]  of_node_get from of_fwnode_get+0x34/0x40
> > [ 1297.236324]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > [ 1297.247455]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.247488]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.259497]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.264063]  vsnprintf from vprintk_store+0x12c/0x430
> > [ 1297.269176]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.274454]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.274503]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.274534]  _printk from kobject_put+0x8c/0x130
> > [ 1297.289299]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.295245]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.301458]  device_release from kobject_put+0x8c/0x130
> > [ 1297.306718]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.312588]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.319079]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.324684]  worker_thread from kthread+0xd0/0xec
> > [ 1297.329421]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.339200] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.362192] ---[ end trace 0000000000000000 ]---
> > [ 1297.366805] ------------[ cut here ]------------
> > [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28
> > fwnode_full_name_string+0x8c/0xa0
> > [ 1297.379957] refcount_t: underflow; use-after-free.
> > [ 1297.384739] Modules linked in:
> > [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > W          6.3.1-dirty #5
> > [ 1297.396136] Hardware name: Xilinx Zynq Platform
> > [ 1297.400660] Workqueue: events_long device_link_release_fn
> > [ 1297.406079]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.411319]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.416389]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.421206]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.426361]  warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0
> > [ 1297.432820]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.439537]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.444867]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.449406]  vsnprintf from vprintk_store+0x12c/0x430
> > [ 1297.454484]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.459727]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.465056]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.469944]  _printk from kobject_put+0x8c/0x130
> > [ 1297.474570]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.480507]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.486705]  device_release from kobject_put+0x8c/0x130
> > [ 1297.491947]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.497798]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.504256]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.509828]  worker_thread from kthread+0xd0/0xec
> > [ 1297.514550]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.524308] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.547256] ---[ end trace 0000000000000000 ]---
> > [ 1297.168353] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vref
> > [ 1297.551874] ------------[ cut here ]------------
> > [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22
> > kobject_get+0x88/0xa0
> > [ 1297.551903] refcount_t: saturated; leaking memory.
> > [ 1297.551908] Modules linked in:
> > [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > W          6.3.1-dirty #5
> > [ 1297.551931] Hardware name: Xilinx Zynq Platform
> > [ 1297.551938] Workqueue: events_long device_link_release_fn
> > [ 1297.551965]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.551994]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.552023]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.552059]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.552092]  warn_slowpath_fmt from kobject_get+0x88/0xa0
> > [ 1297.552123]  kobject_get from of_node_get+0x14/0x1c
> > [ 1297.552146]  of_node_get from of_fwnode_get+0x34/0x40
> > [ 1297.552173]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > [ 1297.552204]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.552234]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.552262]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.552289]  vsnprintf from vscnprintf+0x10/0x24
> > [ 1297.552317]  vscnprintf from printk_sprint+0x18/0xf4
> > [ 1297.552353]  printk_sprint from vprintk_store+0x2d0/0x430
> > [ 1297.552388]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.552422]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.552457]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.552483]  _printk from kobject_put+0x8c/0x130
> > [ 1297.552507]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.552539]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.552571]  device_release from kobject_put+0x8c/0x130
> > [ 1297.552599]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.552630]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.552657]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.552674]  worker_thread from kthread+0xd0/0xec
> > [ 1297.552701]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.552736] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.552768] ---[ end trace 0000000000000000 ]---
> > [ 1297.769464] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vdd-1-8
> > [ 1297.777337] OF: ERROR: memory leak before free overlay changeset,  /fpga-
> > axi/dummy_dev
> > [ 1297.777360] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vio
> > 
> > the above is easily reproducible with a dummy platform device that just
> > gets some regulators (devlinks are created at that point).
> > 
> > The culprit of the above is actually this line:
> > 
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366
> > 
> > This also makes me wonder if we should not have an extra patch just printing
> > 'node->full_name' instead of '%pOF'. As seen, we should not make any
> > assumptions
> > (like parent's being around :)) at this stage.
> > 
> > To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset
> > is already gone so if we ever reach of_node_release() we can proceed
> > normally. Obviously, I'm not really sure about this and that's the whole
> > reason
> > why I'm bringing this as an RFC. This looks like a nasty corner case and I
> > know
> > that adding/removing devices dynamically is far from being easy to handle...
> > 
> > Also not sure if the driver core folks should be aware of this...
> > 
> > Lastly, one of workarounding this issue is by manually unbiding the device
> > from the driver before removing the overlay which would (potentially) give
> > time for the workers to run.
> > 
> > Nuno Sa (1):
> >   of: dynamic: allow freeing of_nodes after the overlay changeset
> > 
> >  drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++----
> >  include/linux/of.h   |  1 +
> >  2 files changed, 28 insertions(+), 4 deletions(-)
> > 
> 
> Very nice problem description, thanks.
> 
> You have found a real problem of how the devlink implementation did not
> consider
> the impact on overlays.  The overlay removal process does not expect any node
> created by an overlay apply to exist at the moment that the overlay removal
> completes (to be a little more pedantic, the removal process of deletes nodes
> and
> properties occurs in the exact opposite order that they were created.  Thus
> even
> partly through the overlay removal, a node is deleted before its parent is
> deleted.  Also, all of the properties for any node deleted during the removal
> are deleted (even if the node refcount does not reach zero).
> 
> The RFC patch does not attempt to fix the actual problem, it merely suppresses
> reporting the symptom.
> 

So I agree this patch was not fixing the real problem and was clumsy and I'm
aware the following is also kind of a workaround but hopefully one more
acceptable:


diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index cd3821a6444f..550370555bf7 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -461,11 +461,23 @@ struct device_node *__of_node_dup(const struct device_node
*np,
        return NULL;
 }

+static unsigned int __of_check_kref(struct kref *kref)
+{
+       unsigned int cnt = kref_read(kref);
+
+       if (cnt <= 1)
+               return 0;
+
+       msleep(50);
+
+       return kref_read(kref);
+}
+
 static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
 {
        if (ce->action == OF_RECONFIG_ATTACH_NODE &&
            of_node_check_flag(ce->np, OF_OVERLAY)) {
-               if (kref_read(&ce->np->kobj.kref) > 1) {
+               if (__of_check_kref(&ce->np->kobj.kref) > 1) {
                        pr_err("ERROR: memory leak, expected refcount 1 instead
of %d, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach
overlay node %pOF\n",
                               kref_read(&ce->np->kobj.kref), ce->np);
                } else {

----

This would still need a proper comment before the sleep and maybe better fine
tune the sleep time (and actually test it :)). 

The thing is, I'm guessing the dev_link stuff is async for some reason and I'm
fairly sure that changing it to a synchronous fashion is likely a NAK. So the
only other thing that comes to my mind would be some kind of synchronization at
the kobject level which would probably feel a bit hacky. Anyways, bottom line,
in the overlay code path (and assuming the async nature won't change) we would
still need to do some kind of waiting and probably with some timeout (so we
don't wait indefinitely). Hence, the above patch does not feel that bad as we
are anyways doing some waiting to see if all the references where released in
the meantime. And yes, if after X time, we still have kref > 1, then possible
there's a real bug of someone taking a lot of time to release the reference (or
not releasing at all).

Would the above patch be something more acceptable?

- Nuno Sá





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux