Re: [PATCH v3 2/4] i2c: Replace list-based mechanism for handling auto-detected clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Got a deadlock issue with this patch in v6.14-rc1.

On Fri, 1 Nov 2024 23:09:51 +0100
Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:

> So far a list is used to track auto-detected clients per driver.
> The same functionality can be achieved much simpler by flagging
> auto-detected clients.
> 
> Two notes regarding the usage of driver_for_each_device:
> In our case it can't fail, however the function is annotated __must_check.
> So a little workaround is needed to avoid a compiler warning.
> Then we may remove nodes from the list over which we iterate.
> This is safe, see the explanation at the beginning of lib/klist.c.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@xxxxxxxxx>
> ---
> v3:
> - protect client removal with core_lock mutex
> ---
>  drivers/i2c/i2c-core-base.c | 52 ++++++++++++-------------------------
>  include/linux/i2c.h         |  3 +--
>  2 files changed, 17 insertions(+), 38 deletions(-)
> 
...

> @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap)
>  	 * we can't remove the dummy devices during the first pass: they
>  	 * could have been instantiated by real devices wishing to clean
>  	 * them up properly, so we give them a chance to do that first. */
> +	mutex_lock(&core_lock);
>  	device_for_each_child(&adap->dev, NULL, __unregister_client);
>  	device_for_each_child(&adap->dev, NULL, __unregister_dummy);
> +	mutex_unlock(&core_lock);
>  

Calling __unregister_client() with core_lock mutex held leads to a deadlock
in my case:

    # echo 30a40000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
    [  242.928264] 
    [  242.929779] ============================================
    [  242.935092] WARNING: possible recursive locking detected
    [  242.940406] 6.14.0-rc1+ #22 Not tainted
    [  242.944245] --------------------------------------------
    [  242.949556] sh/299 is trying to acquire lock:
    [  242.953915] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0x44/0x1b0
    [  242.961689] 
    [  242.961689] but task is already holding lock:
    [  242.967524] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0
    [  242.975285] 
    [  242.975285] other info that might help us debug this:
    [  242.981814]  Possible unsafe locking scenario:
    [  242.981814] 
    [  242.987732]        CPU0
    [  242.990179]        ----
    [  242.992625]   lock(core_lock);
    [  242.995686]   lock(core_lock);
    [  242.998748] 
    [  242.998748]  *** DEADLOCK ***
    [  242.998748] 
    [  243.004666]  May be due to missing lock nesting notation
    [  243.004666] 
    [  243.011455] 5 locks held by sh/299:
    [  243.014946]  #0: ffff000079a533f0 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x1c4/0x398
    [  243.022976]  #1: ffff000005c29088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x1c8
    [  243.031962]  #2: ffff000000c240f8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250
    [  243.041645]  #3: ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0
    [  243.049845]  #4: ffff000079f24908 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250
    [  243.059522] 
    [  243.059522] stack backtrace:
    [  243.063883] CPU: 2 UID: 0 PID: 299 Comm: sh Not tainted 6.14.0-rc1+ #22
    [  243.070502] Hardware name: GE HealthCare Supernova Patient Hub v1 (DT)
    [  243.077032] Call trace:
    [  243.079481]  show_stack+0x20/0x38 (C)
    [  243.083152]  dump_stack_lvl+0x90/0xd0
    [  243.086819]  dump_stack+0x18/0x28
    [  243.090140]  print_deadlock_bug+0x260/0x350
    [  243.094332]  __lock_acquire+0x113c/0x2180
    [  243.098346]  lock_acquire+0x1c4/0x350
    [  243.102015]  __mutex_lock+0x9c/0x500
    [  243.105599]  mutex_lock_nested+0x2c/0x40
    [  243.109528]  i2c_del_adapter+0x44/0x1b0
    [  243.113371]  i2c_mux_del_adapters+0xa0/0x100
    [  243.117649]  pca954x_cleanup+0x98/0xd0
    [  243.121406]  pca954x_remove+0x38/0x50
    [  243.125078]  i2c_device_remove+0x34/0xb8
    [  243.129007]  device_remove+0x54/0x90
    [  243.132590]  device_release_driver_internal+0x1e8/0x250
    [  243.137824]  device_release_driver+0x20/0x38
    [  243.142101]  bus_remove_device+0xd4/0x120
    [  243.146116]  device_del+0x14c/0x410
    [  243.149612]  device_unregister+0x20/0x48
    [  243.153540]  i2c_unregister_device.part.0+0x50/0x88
    [  243.158427]  __unregister_client+0x74/0x80
    [  243.162530]  device_for_each_child+0x68/0xc8
    [  243.166811]  i2c_del_adapter+0xb8/0x1b0
    [  243.170653]  i2c_imx_remove+0x4c/0x190
    [  243.174412]  platform_remove+0x30/0x58
    [  243.178167]  device_remove+0x54/0x90
    [  243.181751]  device_release_driver_internal+0x1e8/0x250
    [  243.186982]  device_driver_detach+0x20/0x38
    [  243.191172]  unbind_store+0xbc/0xc8
    ...

When I unbind the i2c SoC adapter driver, i2c_del_adapter() is indeed called
recursively. The first call is for the 30a40000.i2c SoC adapter and the
second one for an i2c mux connected on the i2c bus.

My device-tree looks like this:
	i2c@30a40000 {
		compatible = "fsl,imx8mp-i2c", "fsl,imx21-i2c";
		...
		i2c-mux@70 {
			compatible = "nxp,pca9543";
			...
			i2c@0 {
				...
				touchscreen@2a {
					compatible = "eeti,exc80h60";
					...
				};
			};
			
			i2c@1 {
				...
			};
		};
	};


Should the core_lock mutex be taken when both __unregister_client() and
__unregister_dummy() are called ?

Best regards,
Hervé Codina




[Index of Archives]     [Linux GPIO]     [Linux SPI]     [Linux Hardward Monitoring]     [LM Sensors]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux