On Sun, 2023-05-14 at 16:20 -0500, Frank Rowand wrote: > On 5/11/23 10:10, Nuno Sa wrote: > > There are valid cases where we might get in here with an unexpected > > refcount. One example are devlinks between suppliers <-> consumers. > > When a link is created, it will grab a reference to both the supplier and > > the consumer devices (which can, potentially, hold a reference to it's > > of_node) and this reference is not synchronously dropped when unbinding the > > device from the driver. Instead, a work item is queued (see > > devlink_dev_release()). This async nature will make that > > __of_changeset_entry_destroy() is reached with a refcount > 1. But, > > eventually, all the refcounts are released and of_node_release() is > > reached. > > > > All the above will lead to leaks and the following dumps: > > > > [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2, > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > node /fpga-axi/dummy_dev > > [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3, > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > node /regulator-vio > > [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3, > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > node /regulator-vdd-1-8 > > [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3, > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > node /regulator-vref > > > > 1297.168367] ------------[ cut here ]------------ > > [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25 > > kobject_get+0x9c/0xa0 > > [ 1297.180514] refcount_t: addition on 0; use-after-free. > > [ 1297.185661] Modules linked in: > > [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty > > #5 > > [ 1297.195617] Hardware name: Xilinx Zynq Platform > > [ 1297.200158] Workqueue: events_long device_link_release_fn > > [ 1297.205600] unwind_backtrace from show_stack+0x10/0x14 > > [ 1297.210880] show_stack from dump_stack_lvl+0x40/0x4c > > [ 1297.215983] dump_stack_lvl from __warn+0x78/0x124 > > [ 1297.220816] __warn from warn_slowpath_fmt+0x134/0x18c > > [ 1297.220857] warn_slowpath_fmt from kobject_get+0x9c/0xa0 > > [ 1297.220892] kobject_get from of_node_get+0x14/0x1c > > [ 1297.236279] of_node_get from of_fwnode_get+0x34/0x40 > > [ 1297.236324] of_fwnode_get from fwnode_full_name_string+0x34/0xa0 > > [ 1297.247455] fwnode_full_name_string from device_node_string+0x5a8/0x750 > > [ 1297.247488] device_node_string from pointer+0x3d0/0x67c > > [ 1297.259497] pointer from vsnprintf+0x20c/0x410 > > [ 1297.264063] vsnprintf from vprintk_store+0x12c/0x430 > > [ 1297.269176] vprintk_store from vprintk_emit+0xe0/0x23c > > [ 1297.274454] vprintk_emit from vprintk_default+0x20/0x28 > > [ 1297.274503] vprintk_default from _printk+0x2c/0x58 > > [ 1297.274534] _printk from kobject_put+0x8c/0x130 > > [ 1297.289299] kobject_put from platform_device_release+0x10/0x3c > > [ 1297.295245] platform_device_release from device_release+0x30/0xa0 > > [ 1297.301458] device_release from kobject_put+0x8c/0x130 > > [ 1297.306718] kobject_put from device_link_release_fn+0x54/0xa8 > > [ 1297.312588] device_link_release_fn from process_one_work+0x1fc/0x4c8 > > [ 1297.319079] process_one_work from worker_thread+0x50/0x54c > > [ 1297.324684] worker_thread from kthread+0xd0/0xec > > [ 1297.329421] kthread from ret_from_fork+0x14/0x2c > > [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8) > > [ 1297.339200] 5fa0: 00000000 00000000 > > 00000000 00000000 > > [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 > > [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > [ 1297.362192] ---[ end trace 0000000000000000 ]--- > > [ 1297.366805] ------------[ cut here ]------------ > > [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28 > > fwnode_full_name_string+0x8c/0xa0 > > [ 1297.379957] refcount_t: underflow; use-after-free. > > [ 1297.384739] Modules linked in: > > [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G > > W 6.3.1-dirty #5 > > [ 1297.396136] Hardware name: Xilinx Zynq Platform > > [ 1297.400660] Workqueue: events_long device_link_release_fn > > [ 1297.406079] unwind_backtrace from show_stack+0x10/0x14 > > [ 1297.411319] show_stack from dump_stack_lvl+0x40/0x4c > > [ 1297.416389] dump_stack_lvl from __warn+0x78/0x124 > > [ 1297.421206] __warn from warn_slowpath_fmt+0x134/0x18c > > [ 1297.426361] warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0 > > [ 1297.432820] fwnode_full_name_string from device_node_string+0x5a8/0x750 > > [ 1297.439537] device_node_string from pointer+0x3d0/0x67c > > [ 1297.444867] pointer from vsnprintf+0x20c/0x410 > > [ 1297.449406] vsnprintf from vprintk_store+0x12c/0x430 > > [ 1297.454484] vprintk_store from vprintk_emit+0xe0/0x23c > > [ 1297.459727] vprintk_emit from vprintk_default+0x20/0x28 > > [ 1297.465056] vprintk_default from _printk+0x2c/0x58 > > [ 1297.469944] _printk from kobject_put+0x8c/0x130 > > [ 1297.474570] kobject_put from platform_device_release+0x10/0x3c > > [ 1297.480507] platform_device_release from device_release+0x30/0xa0 > > [ 1297.486705] device_release from kobject_put+0x8c/0x130 > > [ 1297.491947] kobject_put from device_link_release_fn+0x54/0xa8 > > [ 1297.497798] device_link_release_fn from process_one_work+0x1fc/0x4c8 > > [ 1297.504256] process_one_work from worker_thread+0x50/0x54c > > [ 1297.509828] worker_thread from kthread+0xd0/0xec > > [ 1297.514550] kthread from ret_from_fork+0x14/0x2c > > [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8) > > [ 1297.524308] 5fa0: 00000000 00000000 > > 00000000 00000000 > > [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 > > [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > [ 1297.547256] ---[ end trace 0000000000000000 ]--- > > [ 1297.168353] OF: ERROR: memory leak before free overlay changeset, > > /regulator-vref > > [ 1297.551874] ------------[ cut here ]------------ > > [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22 > > kobject_get+0x88/0xa0 > > [ 1297.551903] refcount_t: saturated; leaking memory. > > [ 1297.551908] Modules linked in: > > [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G > > W 6.3.1-dirty #5 > > [ 1297.551931] Hardware name: Xilinx Zynq Platform > > [ 1297.551938] Workqueue: events_long device_link_release_fn > > [ 1297.551965] unwind_backtrace from show_stack+0x10/0x14 > > [ 1297.551994] show_stack from dump_stack_lvl+0x40/0x4c > > [ 1297.552023] dump_stack_lvl from __warn+0x78/0x124 > > [ 1297.552059] __warn from warn_slowpath_fmt+0x134/0x18c > > [ 1297.552092] warn_slowpath_fmt from kobject_get+0x88/0xa0 > > [ 1297.552123] kobject_get from of_node_get+0x14/0x1c > > [ 1297.552146] of_node_get from of_fwnode_get+0x34/0x40 > > [ 1297.552173] of_fwnode_get from fwnode_full_name_string+0x34/0xa0 > > [ 1297.552204] fwnode_full_name_string from device_node_string+0x5a8/0x750 > > [ 1297.552234] device_node_string from pointer+0x3d0/0x67c > > [ 1297.552262] pointer from vsnprintf+0x20c/0x410 > > [ 1297.552289] vsnprintf from vscnprintf+0x10/0x24 > > [ 1297.552317] vscnprintf from printk_sprint+0x18/0xf4 > > [ 1297.552353] printk_sprint from vprintk_store+0x2d0/0x430 > > [ 1297.552388] vprintk_store from vprintk_emit+0xe0/0x23c > > [ 1297.552422] vprintk_emit from vprintk_default+0x20/0x28 > > [ 1297.552457] vprintk_default from _printk+0x2c/0x58 > > [ 1297.552483] _printk from kobject_put+0x8c/0x130 > > [ 1297.552507] kobject_put from platform_device_release+0x10/0x3c > > [ 1297.552539] platform_device_release from device_release+0x30/0xa0 > > [ 1297.552571] device_release from kobject_put+0x8c/0x130 > > [ 1297.552599] kobject_put from device_link_release_fn+0x54/0xa8 > > [ 1297.552630] device_link_release_fn from process_one_work+0x1fc/0x4c8 > > [ 1297.552657] process_one_work from worker_thread+0x50/0x54c > > [ 1297.552674] worker_thread from kthread+0xd0/0xec > > [ 1297.552701] kthread from ret_from_fork+0x14/0x2c > > [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8) > > [ 1297.552736] 5fa0: 00000000 00000000 > > 00000000 00000000 > > [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 > > [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > [ 1297.552768] ---[ end trace 0000000000000000 ]--- > > [ 1297.769464] OF: ERROR: memory leak before free overlay changeset, > > /regulator-vdd-1-8 > > [ 1297.777337] OF: ERROR: memory leak before free overlay changeset, /fpga- > > axi/dummy_dev > > [ 1297.777360] OF: ERROR: memory leak before free overlay changeset, > > /regulator-vio > > > > the above is easily reproducible with a dummy platform device that just > > gets some regulators (devlinks are created at that point). > > > > The culprit of the above is actually this line: > > > > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366 > > > > This also makes me wonder if we should not have an extra patch just printing > > 'node->full_name' instead of '%pOF'. As seen, we should not make any > > assumptions > > (like parent's being around :)) at this stage. > > > > To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset > > is already gone so if we ever reach of_node_release() we can proceed > > normally. Obviously, I'm not really sure about this and that's the whole > > reason > > why I'm bringing this as an RFC. This looks like a nasty corner case and I > > know > > that adding/removing devices dynamically is far from being easy to handle... > > > > Also not sure if the driver core folks should be aware of this... > > > > Lastly, one of workarounding this issue is by manually unbiding the device > > from the driver before removing the overlay which would (potentially) give > > time for the workers to run. > > > > Nuno Sa (1): > > of: dynamic: allow freeing of_nodes after the overlay changeset > > > > drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++---- > > include/linux/of.h | 1 + > > 2 files changed, 28 insertions(+), 4 deletions(-) > > > > Very nice problem description, thanks. > > You have found a real problem of how the devlink implementation did not > consider > the impact on overlays. The overlay removal process does not expect any node > created by an overlay apply to exist at the moment that the overlay removal > completes (to be a little more pedantic, the removal process of deletes nodes > and > properties occurs in the exact opposite order that they were created. Thus > even > partly through the overlay removal, a node is deleted before its parent is > deleted. Also, all of the properties for any node deleted during the removal > are deleted (even if the node refcount does not reach zero). > > The RFC patch does not attempt to fix the actual problem, it merely suppresses > reporting the symptom. > > -Frank > Hi Frank, Thanks for taking a look into this. I was afraid that this was to straightforward indeed. So I guess you're saying this is something that needs to be fixed at the devlink level? +cc Greg and Rafael for some help and advice :) So, I'd very much try to give this a proper fix but this is messing with very core components so any help would be welcome. The only think that comes to my mind is to have some kind of syncing between the moment (or even before) the work is queued and the moment the node is deleted. I guess this would have to happen at the kobject level... Another piece of information is that this is an issue for platform devices and pretty much for any device that releases the node during it's .dev.release callback (which BTW, sound correct to me). For example, spi and i2c devices are releasing the node during the driver - device unbind and not when the device gets released. This actually gives time for things to settle before reaching the point of deleting the node (not saying this correct or should be the solution for this case). - Nuno Sá