All of these memleaks are related with of_node's, instances of struct device_node. When these instances are referred by other kernel instances, their refcount's are increased with of_node_put(). When the other kernel instances being freed, the dereferences of these device_node instances should have been done with of_node_put(). But actually, they are missed. That caused that refcount's of these device_node instances will never be decreased to ZERO then they and other related memory blocks will never be freed. struct device_node { const char *name; const char *type; phandle phandle; const char *full_name; struct fwnode_handle fwnode; struct property *properties; ... }; instance of device_node in memory: (On little endian system) High Address |-----------| | properties| |-----------| | full_name | |-----------| | phandle | |-----------| | type | |-----------| | name | Low Address |-----------| Details of reproduction and descriptions are below: 1. Steps to reproduce: A. In kernel, enable CONFIG_OF_UNITTEST and CONFIG_OF_OVERLAY. B. On console command line: # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak 2. Description of each patch (1) [PATCH] of: fix of_node leak caused in of_find_node_opts_by_path A. Memleaks reported as below (Little endian system -- 64bit): unreferenced object 0xffffffc3e488ed00 (size 192): comm "swapper/0", pid 1, jiffies 4294940291 (age 274.230s) hex dump (first 32 bytes): 98 73 c0 00 c0 ff ff ff 98 73 c0 00 c0 ff ff ff .s.......s...... 00 00 00 00 00 00 00 00 40 8c a7 e4 c3 ff ff ff ........@....... backtrace: [<ffffffc0001e6cd4>] create_object+0x104/0x278 [<ffffffc00097c6a8>] kmemleak_alloc+0x60/0xd0 [<ffffffc0001d5718>] kmem_cache_alloc+0x240/0x330 [<ffffffc00080254c>] __of_node_dup+0x34/0x160 [<ffffffc000d16a08>] of_unittest_changeset+0x10c/0x9a8 [<ffffffc000d198e4>] of_unittest+0xf40/0x14e4 [<ffffffc000084310>] do_one_initcall+0xc8/0x1f8 [<ffffffc000cddc18>] kernel_init_freeable+0x21c/0x2f0 [<ffffffc00097b4d8>] kernel_init+0x10/0xe0 [<ffffffc000083c10>] ret_from_fork+0x10/0x40 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffffffc3e4a78c40 (size 64): comm "swapper/0", pid 1, jiffies 4294940291 (age 274.230s) hex dump (first 32 bytes): 2f 74 65 73 74 63 61 73 65 2d 64 61 74 61 2f 63 /testcase-data/c 68 61 6e 67 65 73 65 74 2f 6e 32 00 6b 6b 6b 6b hangeset/n2.kkkk backtrace: [<ffffffc0001e6cd4>] create_object+0x104/0x278 [<ffffffc00097c6a8>] kmemleak_alloc+0x60/0xd0 [<ffffffc0001d7c28>] __kmalloc_track_caller+0x258/0x380 [<ffffffc000506f64>] kvasprintf+0x64/0xa8 [<ffffffc000802590>] __of_node_dup+0x78/0x160 [<ffffffc000d16a08>] of_unittest_changeset+0x10c/0x9a8 [<ffffffc000d198e4>] of_unittest+0xf40/0x14e4 [<ffffffc000084310>] do_one_initcall+0xc8/0x1f8 [<ffffffc000083c10>] ret_from_fork+0x10/0x40 [<ffffffffffffffff>] 0xffffffffffffffff ... ... B. Analysis of log: According to the calltrace and address of unreferenced objects, we can figure out that they are device_node instance and the memory block pointed by the field named "full_name" in device_node, since the address 0xffffffc3e4a78c40 is exactly the same as content of the third field, full_path, of device_node. In little endian system, "40 8c a7 e4 c3 ff ff ff" is ffffffc3e4a78c40. And there are other unreferenced objects, that are referred by some fields of device_node, like properties, etc. C. method of debugging: Trace where the node is gotten and where the node is put. With the name of device_node instances, we can record the device_node instance when it is created. And with the address of kobject field in device_node, we can trace where it is gotten and where it is put. Tracing patch like below: diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 1801634..6e3cda2 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -489,6 +489,8 @@ static void __init of_unittest_property_copy(void) #endif } +struct device_node *np2 = NULL; +EXPORT_SYMBOL(np2); static void __init of_unittest_changeset(void) { #ifdef CONFIG_OF_DYNAMIC @@ -502,6 +504,7 @@ static void __init of_unittest_changeset(void) unittest(n1, "testcase setup failure\n"); n2 = __of_node_dup(NULL, "/testcase-data/changeset/n2"); unittest(n2, "testcase setup failure\n"); + np2 = n2; n21 = __of_node_dup(NULL, "%s/%s", "/testcase-data/changeset/n2", "n21"); unittest(n21, "testcase setup failure %p\n", n21); nremove = of_find_node_by_path("/testcase-data/changeset/node-remove"); diff --git a/lib/kobject.c b/lib/kobject.c index 3b841b9..725613f 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -18,7 +18,9 @@ #include <linux/stat.h> #include <linux/slab.h> #include <linux/random.h> +#include <linux/of.h> +extern struct device_node *np2; /** * kobject_namespace - return @kobj's namespace tag * @kobj: kobject in question @@ -576,6 +578,10 @@ void kobject_del(struct kobject *kobj) */ struct kobject *kobject_get(struct kobject *kobj) { + if(np2 && kobj == &np2->kobj) { + dump_stack(); + printk("\trefcount = %d\n", (atomic_read(&kobj->kref.refcount) + 1)); + } if (kobj) { if (!kobj->state_initialized) WARN(1, KERN_WARNING "kobject: '%s' (%p): is not " @@ -668,6 +674,10 @@ static void kobject_release(struct kref *kref) */ void kobject_put(struct kobject *kobj) { + if(np2 && kobj == &np2->kobj) { + dump_stack(); + printk("\trefcount = %d\n", (atomic_read(&kobj->kref.refcount) - 1)); + } if (kobj) { if (!kobj->state_initialized) WARN(1, KERN_WARNING "kobject: '%s' (%p): is not " D. Analysis and fix Then tested again, with the calltraces, we can reduce the scope in of_find_node_opts_by_path(). In of_find_node_opts_by_path(), when we try to find a node matching a full path, we start at getting the top node, then getting child node of top node, then child node of child node, recursively, until getting the target or nothing. The procedure is realized by a loop: /* Step down the tree matching path components */ raw_spin_lock_irqsave(&devtree_lock, flags); if (!np) np = of_node_get(of_root); while (np && *path == '/') { path++; /* Increment past '/' delimiter */ np = __of_find_node_by_path(np, path); path = strchrnul(path, '/'); if (separator && separator < path) break; } raw_spin_unlock_irqrestore(&devtree_lock, flags); During the procedure above, every parent node's refcount is increased. Parent node in the full path is gotten before traversing its child node to find the lower node. And if parent node is gotten successfully, its refcount will be increased through __of_find_node_by_path with of_node_get(). Parent node's refcount must be decreased, because each parent node is just a step node and the lowest node in full path is the target of of_find_node_opts_by_path(). Or, it will cause refcount of device_node instances referred by parent nodes in full path will never decreased to ZERO and these instances and other related memory blocks will never be freed. To fix it, decrease refcount of parent node after __of_find_node_by_path(). (2) [PATCH] i2c: add missing of_node_put in i2c_unregister_device A. Memleaks reported as below (Little endian system -- 64bit): unreferenced object 0xffffffc0f8fc5d40 (size 192): comm "swapper/0", pid 1, jiffies 4294940674 (age 260.340s) hex dump (first 32 bytes): 00 d5 72 e8 c3 ff ff ff a8 73 c0 00 c0 ff ff ff ..r......s...... 00 00 00 00 00 00 00 00 00 ce af e4 c3 ff ff ff ................ backtrace: [<ffffffc0001e6cd4>] create_object+0x104/0x278 [<ffffffc00097c708>] kmemleak_alloc+0x60/0xd0 [<ffffffc0001d5718>] kmem_cache_alloc+0x240/0x330 [<ffffffc0008025ac>] __of_node_dup+0x34/0x160 [<ffffffc0008082cc>] of_overlay_apply_one+0x19c/0x268 [<ffffffc000808340>] of_overlay_apply_one+0x210/0x268 [<ffffffc000808848>] of_overlay_create+0x1c0/0x368 [<ffffffc000984640>] of_unittest_apply_overlay.isra.3+0xb4/0x140 [<ffffffc00098479c>] of_unittest_apply_overlay_check+0xd0/0x188 [<ffffffc000d186a4>] of_unittest_overlay+0xd50/0x1050 [<ffffffc000d19e44>] of_unittest+0x14a0/0x14e4 [<ffffffc000084310>] do_one_initcall+0xc8/0x1f8 [<ffffffc000cddc18>] kernel_init_freeable+0x21c/0x2f0 [<ffffffc00097b538>] kernel_init+0x10/0xe0 [<ffffffc000083c10>] ret_from_fork+0x10/0x40 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffffffc3e4afce00 (size 128): comm "swapper/0", pid 1, jiffies 4294940674 (age 260.340s) hex dump (first 32 bytes): 2f 74 65 73 74 63 61 73 65 2d 64 61 74 61 2f 6f /testcase-data/o 76 65 72 6c 61 79 2d 6e 6f 64 65 2f 74 65 73 74 verlay-node/test backtrace: [<ffffffc0001e6cd4>] create_object+0x104/0x278 [<ffffffc00097c708>] kmemleak_alloc+0x60/0xd0 [<ffffffc0001d7c28>] __kmalloc_track_caller+0x258/0x380 [<ffffffc000506fbc>] kvasprintf+0x64/0xa8 [<ffffffc0008025f0>] __of_node_dup+0x78/0x160 [<ffffffc0008082cc>] of_overlay_apply_one+0x19c/0x268 [<ffffffc000808340>] of_overlay_apply_one+0x210/0x268 [<ffffffc000808848>] of_overlay_create+0x1c0/0x368 [<ffffffc000984640>] of_unittest_apply_overlay.isra.3+0xb4/0x140 [<ffffffc00098479c>] of_unittest_apply_overlay_check+0xd0/0x188 [<ffffffc000d186a4>] of_unittest_overlay+0xd50/0x1050 [<ffffffc000d19e44>] of_unittest+0x14a0/0x14e4 [<ffffffc000084310>] do_one_initcall+0xc8/0x1f8 [<ffffffc000cddc18>] kernel_init_freeable+0x21c/0x2f0 [<ffffffc00097b538>] kernel_init+0x10/0xe0 [<ffffffc000083c10>] ret_from_fork+0x10/0x40 B. Analysis of log: same as (1).B. C. method of debugging: same as (1).B. D. Analysis and fix Refcount of of_node is increased with "info.of_node = of_node_get(node);" in of_i2c_register_device(), and it is referred by adpter through i2c_new_device(). When the i2c node is unregistered from i2c adapter, it must be decreased with of_node_put(). Or, kind of memory leaks as (1).A will occur. (3) [PATCH] i2c: add missing of_node_put in i2c_mux_del_adapters some kind of memory leaks as (2).A A. Memleaks reported as below (Little endian system -- 64bit): B. Analysis of log: same as (1).B. C. method of debugging: same as (1).B. D. Analysis and fix Within i2c_add_mux_adapter(), in the loop where trying to populate the mux adapter's of_node, refcount of of_node is increased if the proper of_node is found. /* * Try to populate the mux adapter's of_node, expands to * nothing if !CONFIG_OF. */ if (mux_dev->of_node) { struct device_node *child; u32 reg; for_each_child_of_node(mux_dev->of_node, child) { ret = of_property_read_u32(child, "reg", ®); if (ret) continue; if (chan_id == reg) { priv->adap.dev.of_node = child; break; } } } In the loop "for_each_child_of_node", if chan_id equals to reg, the loop will broken. Refcount of child, which is increased in this circle, will not be decreased in next circle when try to traverse its sibling. That is legal, because we want to get it. But, within i2c_mux_del_adapters(), we forget to release it. And this will cause that refcount of of_node will never be decreased to ZERO. Device_node instance referred by of_node and other related memory blocks will never be freed. To fix it, just add the missing of_node_put() in i2c_mux_del_adapter(). Qi Hou (3): of: fix of_node leak caused in of_find_node_opts_by_path i2c: add missing of_node_put in i2c_unregister_device i2c: add missing of_node_put in i2c_mux_del_adapters drivers/i2c/i2c-core.c | 1 + drivers/i2c/i2c-mux.c | 2 ++ drivers/of/base.c | 4 +++- 3 files changed, 6 insertions(+), 1 deletion(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html