On Mon, 2023-05-15 at 08:57 +0200, Nuno Sá wrote: > On Sun, 2023-05-14 at 16:20 -0500, Frank Rowand wrote: > > On 5/11/23 10:10, Nuno Sa wrote: > > > There are valid cases where we might get in here with an unexpected > > > refcount. One example are devlinks between suppliers <-> consumers. > > > When a link is created, it will grab a reference to both the supplier and > > > the consumer devices (which can, potentially, hold a reference to it's > > > of_node) and this reference is not synchronously dropped when unbinding the > > > device from the driver. Instead, a work item is queued (see > > > devlink_dev_release()). This async nature will make that > > > __of_changeset_entry_destroy() is reached with a refcount > 1. But, > > > eventually, all the refcounts are released and of_node_release() is > > > reached. > > > > > > All the above will lead to leaks and the following dumps: > > > > > > [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2, > > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > > node /fpga-axi/dummy_dev > > > [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3, > > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > > node /regulator-vio > > > [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3, > > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > > node /regulator-vdd-1-8 > > > [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3, > > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay > > > node /regulator-vref > > > > > > 1297.168367] ------------[ cut here ]------------ > > > [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25 > > > kobject_get+0x9c/0xa0 > > > [ 1297.180514] refcount_t: addition on 0; use-after-free. > > > [ 1297.185661] Modules linked in: > > > [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty > > > #5 > > > [ 1297.195617] Hardware name: Xilinx Zynq Platform > > > [ 1297.200158] Workqueue: events_long device_link_release_fn > > > [ 1297.205600] unwind_backtrace from show_stack+0x10/0x14 > > > [ 1297.210880] show_stack from dump_stack_lvl+0x40/0x4c > > > [ 1297.215983] dump_stack_lvl from __warn+0x78/0x124 > > > [ 1297.220816] __warn from warn_slowpath_fmt+0x134/0x18c > > > [ 1297.220857] warn_slowpath_fmt from kobject_get+0x9c/0xa0 > > > [ 1297.220892] kobject_get from of_node_get+0x14/0x1c > > > [ 1297.236279] of_node_get from of_fwnode_get+0x34/0x40 > > > [ 1297.236324] of_fwnode_get from fwnode_full_name_string+0x34/0xa0 > > > [ 1297.247455] fwnode_full_name_string from device_node_string+0x5a8/0x750 > > > [ 1297.247488] device_node_string from pointer+0x3d0/0x67c > > > [ 1297.259497] pointer from vsnprintf+0x20c/0x410 > > > [ 1297.264063] vsnprintf from vprintk_store+0x12c/0x430 > > > [ 1297.269176] vprintk_store from vprintk_emit+0xe0/0x23c > > > [ 1297.274454] vprintk_emit from vprintk_default+0x20/0x28 > > > [ 1297.274503] vprintk_default from _printk+0x2c/0x58 > > > [ 1297.274534] _printk from kobject_put+0x8c/0x130 > > > [ 1297.289299] kobject_put from platform_device_release+0x10/0x3c > > > [ 1297.295245] platform_device_release from device_release+0x30/0xa0 > > > [ 1297.301458] device_release from kobject_put+0x8c/0x130 > > > [ 1297.306718] kobject_put from device_link_release_fn+0x54/0xa8 > > > [ 1297.312588] device_link_release_fn from process_one_work+0x1fc/0x4c8 > > > [ 1297.319079] process_one_work from worker_thread+0x50/0x54c > > > [ 1297.324684] worker_thread from kthread+0xd0/0xec > > > [ 1297.329421] kthread from ret_from_fork+0x14/0x2c > > > [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8) > > > [ 1297.339200] 5fa0: 00000000 00000000 > > > 00000000 00000000 > > > [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 > > > 00000000 00000000 > > > [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > > [ 1297.362192] ---[ end trace 0000000000000000 ]--- > > > [ 1297.366805] ------------[ cut here ]------------ > > > [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28 > > > fwnode_full_name_string+0x8c/0xa0 > > > [ 1297.379957] refcount_t: underflow; use-after-free. > > > [ 1297.384739] Modules linked in: > > > [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G > > > W 6.3.1-dirty #5 > > > [ 1297.396136] Hardware name: Xilinx Zynq Platform > > > [ 1297.400660] Workqueue: events_long device_link_release_fn > > > [ 1297.406079] unwind_backtrace from show_stack+0x10/0x14 > > > [ 1297.411319] show_stack from dump_stack_lvl+0x40/0x4c > > > [ 1297.416389] dump_stack_lvl from __warn+0x78/0x124 > > > [ 1297.421206] __warn from warn_slowpath_fmt+0x134/0x18c > > > [ 1297.426361] warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0 > > > [ 1297.432820] fwnode_full_name_string from device_node_string+0x5a8/0x750 > > > [ 1297.439537] device_node_string from pointer+0x3d0/0x67c > > > [ 1297.444867] pointer from vsnprintf+0x20c/0x410 > > > [ 1297.449406] vsnprintf from vprintk_store+0x12c/0x430 > > > [ 1297.454484] vprintk_store from vprintk_emit+0xe0/0x23c > > > [ 1297.459727] vprintk_emit from vprintk_default+0x20/0x28 > > > [ 1297.465056] vprintk_default from _printk+0x2c/0x58 > > > [ 1297.469944] _printk from kobject_put+0x8c/0x130 > > > [ 1297.474570] kobject_put from platform_device_release+0x10/0x3c > > > [ 1297.480507] platform_device_release from device_release+0x30/0xa0 > > > [ 1297.486705] device_release from kobject_put+0x8c/0x130 > > > [ 1297.491947] kobject_put from device_link_release_fn+0x54/0xa8 > > > [ 1297.497798] device_link_release_fn from process_one_work+0x1fc/0x4c8 > > > [ 1297.504256] process_one_work from worker_thread+0x50/0x54c > > > [ 1297.509828] worker_thread from kthread+0xd0/0xec > > > [ 1297.514550] kthread from ret_from_fork+0x14/0x2c > > > [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8) > > > [ 1297.524308] 5fa0: 00000000 00000000 > > > 00000000 00000000 > > > [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 > > > 00000000 00000000 > > > [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > > [ 1297.547256] ---[ end trace 0000000000000000 ]--- > > > [ 1297.168353] OF: ERROR: memory leak before free overlay changeset, > > > /regulator-vref > > > [ 1297.551874] ------------[ cut here ]------------ > > > [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22 > > > kobject_get+0x88/0xa0 > > > [ 1297.551903] refcount_t: saturated; leaking memory. > > > [ 1297.551908] Modules linked in: > > > [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G > > > W 6.3.1-dirty #5 > > > [ 1297.551931] Hardware name: Xilinx Zynq Platform > > > [ 1297.551938] Workqueue: events_long device_link_release_fn > > > [ 1297.551965] unwind_backtrace from show_stack+0x10/0x14 > > > [ 1297.551994] show_stack from dump_stack_lvl+0x40/0x4c > > > [ 1297.552023] dump_stack_lvl from __warn+0x78/0x124 > > > [ 1297.552059] __warn from warn_slowpath_fmt+0x134/0x18c > > > [ 1297.552092] warn_slowpath_fmt from kobject_get+0x88/0xa0 > > > [ 1297.552123] kobject_get from of_node_get+0x14/0x1c > > > [ 1297.552146] of_node_get from of_fwnode_get+0x34/0x40 > > > [ 1297.552173] of_fwnode_get from fwnode_full_name_string+0x34/0xa0 > > > [ 1297.552204] fwnode_full_name_string from device_node_string+0x5a8/0x750 > > > [ 1297.552234] device_node_string from pointer+0x3d0/0x67c > > > [ 1297.552262] pointer from vsnprintf+0x20c/0x410 > > > [ 1297.552289] vsnprintf from vscnprintf+0x10/0x24 > > > [ 1297.552317] vscnprintf from printk_sprint+0x18/0xf4 > > > [ 1297.552353] printk_sprint from vprintk_store+0x2d0/0x430 > > > [ 1297.552388] vprintk_store from vprintk_emit+0xe0/0x23c > > > [ 1297.552422] vprintk_emit from vprintk_default+0x20/0x28 > > > [ 1297.552457] vprintk_default from _printk+0x2c/0x58 > > > [ 1297.552483] _printk from kobject_put+0x8c/0x130 > > > [ 1297.552507] kobject_put from platform_device_release+0x10/0x3c > > > [ 1297.552539] platform_device_release from device_release+0x30/0xa0 > > > [ 1297.552571] device_release from kobject_put+0x8c/0x130 > > > [ 1297.552599] kobject_put from device_link_release_fn+0x54/0xa8 > > > [ 1297.552630] device_link_release_fn from process_one_work+0x1fc/0x4c8 > > > [ 1297.552657] process_one_work from worker_thread+0x50/0x54c > > > [ 1297.552674] worker_thread from kthread+0xd0/0xec > > > [ 1297.552701] kthread from ret_from_fork+0x14/0x2c > > > [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8) > > > [ 1297.552736] 5fa0: 00000000 00000000 > > > 00000000 00000000 > > > [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 > > > 00000000 00000000 > > > [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > > [ 1297.552768] ---[ end trace 0000000000000000 ]--- > > > [ 1297.769464] OF: ERROR: memory leak before free overlay changeset, > > > /regulator-vdd-1-8 > > > [ 1297.777337] OF: ERROR: memory leak before free overlay changeset, /fpga- > > > axi/dummy_dev > > > [ 1297.777360] OF: ERROR: memory leak before free overlay changeset, > > > /regulator-vio > > > > > > the above is easily reproducible with a dummy platform device that just > > > gets some regulators (devlinks are created at that point). > > > > > > The culprit of the above is actually this line: > > > > > > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366 > > > > > > This also makes me wonder if we should not have an extra patch just printing > > > 'node->full_name' instead of '%pOF'. As seen, we should not make any > > > assumptions > > > (like parent's being around :)) at this stage. > > > > > > To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset > > > is already gone so if we ever reach of_node_release() we can proceed > > > normally. Obviously, I'm not really sure about this and that's the whole > > > reason > > > why I'm bringing this as an RFC. This looks like a nasty corner case and I > > > know > > > that adding/removing devices dynamically is far from being easy to handle... > > > > > > Also not sure if the driver core folks should be aware of this... > > > > > > Lastly, one of workarounding this issue is by manually unbiding the device > > > from the driver before removing the overlay which would (potentially) give > > > time for the workers to run. > > > > > > Nuno Sa (1): > > > of: dynamic: allow freeing of_nodes after the overlay changeset > > > > > > drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++---- > > > include/linux/of.h | 1 + > > > 2 files changed, 28 insertions(+), 4 deletions(-) > > > > > > > Very nice problem description, thanks. > > > > You have found a real problem of how the devlink implementation did not > > consider > > the impact on overlays. The overlay removal process does not expect any node > > created by an overlay apply to exist at the moment that the overlay removal > > completes (to be a little more pedantic, the removal process of deletes nodes > > and > > properties occurs in the exact opposite order that they were created. Thus > > even > > partly through the overlay removal, a node is deleted before its parent is > > deleted. Also, all of the properties for any node deleted during the removal > > are deleted (even if the node refcount does not reach zero). > > > > The RFC patch does not attempt to fix the actual problem, it merely suppresses > > reporting the symptom. > > > > -Frank > > > > Hi Frank, > > Thanks for taking a look into this. I was afraid that this was to > straightforward indeed. So I guess you're saying this is something that needs to > be fixed at the devlink level? > > +cc Greg and Rafael for some help and advice :) > > So, I'd very much try to give this a proper fix but this is messing with very > core components so any help would be welcome. The only think that comes to my > mind is to have some kind of syncing between the moment (or even before) the > work is queued and the moment the node is deleted. I guess this would have to > happen at the kobject level... > > Another piece of information is that this is an issue for platform devices and > pretty much for any device that releases the node during it's .dev.release > callback (which BTW, sound correct to me). For example, spi and i2c devices are > releasing the node during the driver - device unbind and not when the device > gets released. This actually gives time for things to settle before reaching the > point of deleting the node (not saying this correct or should be the solution > for this case). > > - Nuno Sá Pinging this one just to understand if this is something that we wanna fix at the driver core level... I'll wait a bit more to see if anyone has some comments that might already help me in going in the right direction. - Nuno Sá