On 04.02.2025 16:24, Herve Codina wrote: > Hi, > > Got a deadlock issue with this patch in v6.14-rc1. > > On Fri, 1 Nov 2024 23:09:51 +0100 > Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote: > >> So far a list is used to track auto-detected clients per driver. >> The same functionality can be achieved much simpler by flagging >> auto-detected clients. >> >> Two notes regarding the usage of driver_for_each_device: >> In our case it can't fail, however the function is annotated __must_check. >> So a little workaround is needed to avoid a compiler warning. >> Then we may remove nodes from the list over which we iterate. >> This is safe, see the explanation at the beginning of lib/klist.c. >> >> Signed-off-by: Heiner Kallweit <hkallweit1@xxxxxxxxx> >> --- >> v3: >> - protect client removal with core_lock mutex >> --- >> drivers/i2c/i2c-core-base.c | 52 ++++++++++++------------------------- >> include/linux/i2c.h | 3 +-- >> 2 files changed, 17 insertions(+), 38 deletions(-) >> > ... > >> @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap) >> * we can't remove the dummy devices during the first pass: they >> * could have been instantiated by real devices wishing to clean >> * them up properly, so we give them a chance to do that first. */ >> + mutex_lock(&core_lock); >> device_for_each_child(&adap->dev, NULL, __unregister_client); >> device_for_each_child(&adap->dev, NULL, __unregister_dummy); >> + mutex_unlock(&core_lock); >> > > Calling __unregister_client() with core_lock mutex held leads to a deadlock > in my case: > > # echo 30a40000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind > [ 242.928264] > [ 242.929779] ============================================ > [ 242.935092] WARNING: possible recursive locking detected > [ 242.940406] 6.14.0-rc1+ #22 Not tainted > [ 242.944245] -------------------------------------------- > [ 242.949556] sh/299 is trying to acquire lock: > [ 242.953915] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0x44/0x1b0 > [ 242.961689] > [ 242.961689] but task is already holding lock: > [ 242.967524] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0 > [ 242.975285] > [ 242.975285] other info that might help us debug this: > [ 242.981814] Possible unsafe locking scenario: > [ 242.981814] > [ 242.987732] CPU0 > [ 242.990179] ---- > [ 242.992625] lock(core_lock); > [ 242.995686] lock(core_lock); > [ 242.998748] > [ 242.998748] *** DEADLOCK *** > [ 242.998748] > [ 243.004666] May be due to missing lock nesting notation > [ 243.004666] > [ 243.011455] 5 locks held by sh/299: > [ 243.014946] #0: ffff000079a533f0 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x1c4/0x398 > [ 243.022976] #1: ffff000005c29088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x1c8 > [ 243.031962] #2: ffff000000c240f8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250 > [ 243.041645] #3: ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0 > [ 243.049845] #4: ffff000079f24908 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250 > [ 243.059522] > [ 243.059522] stack backtrace: > [ 243.063883] CPU: 2 UID: 0 PID: 299 Comm: sh Not tainted 6.14.0-rc1+ #22 > [ 243.070502] Hardware name: GE HealthCare Supernova Patient Hub v1 (DT) > [ 243.077032] Call trace: > [ 243.079481] show_stack+0x20/0x38 (C) > [ 243.083152] dump_stack_lvl+0x90/0xd0 > [ 243.086819] dump_stack+0x18/0x28 > [ 243.090140] print_deadlock_bug+0x260/0x350 > [ 243.094332] __lock_acquire+0x113c/0x2180 > [ 243.098346] lock_acquire+0x1c4/0x350 > [ 243.102015] __mutex_lock+0x9c/0x500 > [ 243.105599] mutex_lock_nested+0x2c/0x40 > [ 243.109528] i2c_del_adapter+0x44/0x1b0 > [ 243.113371] i2c_mux_del_adapters+0xa0/0x100 > [ 243.117649] pca954x_cleanup+0x98/0xd0 > [ 243.121406] pca954x_remove+0x38/0x50 > [ 243.125078] i2c_device_remove+0x34/0xb8 > [ 243.129007] device_remove+0x54/0x90 > [ 243.132590] device_release_driver_internal+0x1e8/0x250 > [ 243.137824] device_release_driver+0x20/0x38 > [ 243.142101] bus_remove_device+0xd4/0x120 > [ 243.146116] device_del+0x14c/0x410 > [ 243.149612] device_unregister+0x20/0x48 > [ 243.153540] i2c_unregister_device.part.0+0x50/0x88 > [ 243.158427] __unregister_client+0x74/0x80 > [ 243.162530] device_for_each_child+0x68/0xc8 > [ 243.166811] i2c_del_adapter+0xb8/0x1b0 > [ 243.170653] i2c_imx_remove+0x4c/0x190 > [ 243.174412] platform_remove+0x30/0x58 > [ 243.178167] device_remove+0x54/0x90 > [ 243.181751] device_release_driver_internal+0x1e8/0x250 > [ 243.186982] device_driver_detach+0x20/0x38 > [ 243.191172] unbind_store+0xbc/0xc8 > ... > > When I unbind the i2c SoC adapter driver, i2c_del_adapter() is indeed called > recursively. The first call is for the 30a40000.i2c SoC adapter and the > second one for an i2c mux connected on the i2c bus. > > My device-tree looks like this: > i2c@30a40000 { > compatible = "fsl,imx8mp-i2c", "fsl,imx21-i2c"; > ... > i2c-mux@70 { > compatible = "nxp,pca9543"; > ... > i2c@0 { > ... > touchscreen@2a { > compatible = "eeti,exc80h60"; > ... > }; > }; > > i2c@1 { > ... > }; > }; > }; > > > Should the core_lock mutex be taken when both __unregister_client() and > __unregister_dummy() are called ? > Thanks for the report! I just submitted a fix, for now as RFC. If fixes the deadlock and uses a new approach to prevent the race which caused us to acquire the core lock in few place. Could you please re-test? > Best regards, > Hervé Codina Heiner