Re: [PATCH net] net/smc: Avoid warning of possible recursive locking

Tony Lu <tonylu@xxxxxxxxxxxxxxxxx> · Mon, 22 Nov 2021 20:39:56 +0800



On Mon, Nov 22, 2021 at 08:32:53PM +0800, Wen Gu wrote:
> Possible recursive locking is detected by lockdep when SMC
> falls back to TCP. The corresponding warnings are as follows:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.16.0-rc1+ #18 Tainted: G            E
>  --------------------------------------------
>  wrk/1391 is trying to acquire lock:
>  ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc]
> 
>  but task is already holding lock:
>  ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&ei->socket.wq.wait);
>    lock(&ei->socket.wq.wait);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  2 locks held by wrk/1391:
>   #0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc]
>   #1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
> 
>  stack backtrace:
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x56/0x7b
>   __lock_acquire+0x951/0x11f0
>   lock_acquire+0x27a/0x320
>   ? smc_switch_to_fallback+0x109/0x250 [smc]
>   ? smc_switch_to_fallback+0xfe/0x250 [smc]
>   _raw_spin_lock_irq+0x3b/0x80
>   ? smc_switch_to_fallback+0x109/0x250 [smc]
>   smc_switch_to_fallback+0x109/0x250 [smc]
>   smc_connect_fallback+0xe/0x30 [smc]
>   __smc_connect+0xcf/0x1090 [smc]
>   ? mark_held_locks+0x61/0x80
>   ? __local_bh_enable_ip+0x77/0xe0
>   ? lockdep_hardirqs_on+0xbf/0x130
>   ? smc_connect+0x12a/0x150 [smc]
>   smc_connect+0x12a/0x150 [smc]
>   __sys_connect+0x8a/0xc0
>   ? syscall_enter_from_user_mode+0x20/0x70
>   __x64_sys_connect+0x16/0x20
>   do_syscall_64+0x34/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> The nested locking in smc_switch_to_fallback() is considered to
> possibly cause a deadlock because smc_wait->lock and clc_wait->lock
> are the same type of lock. But actually it is safe so far since
> there is no other place trying to obtain smc_wait->lock when
> clc_wait->lock is held. So the patch replaces spin_lock() with
> spin_lock_nested() to avoid false report by lockdep.
> 
> Link: https://lkml.org/lkml/2021/11/19/962
> Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback")
> Reported-by: syzbot+e979d3597f48262cb4ee@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>

Acked-by: Tony Lu <tonylu@xxxxxxxxxxxxxxxxx>

> ---
>  net/smc/af_smc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index b61c802..2692cba 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -585,7 +585,7 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
>  		 * to clcsocket->wq during the fallback.
>  		 */
>  		spin_lock_irqsave(&smc_wait->lock, flags);
> -		spin_lock(&clc_wait->lock);
> +		spin_lock_nested(&clc_wait->lock, SINGLE_DEPTH_NESTING);
>  		list_splice_init(&smc_wait->head, &clc_wait->head);
>  		spin_unlock(&clc_wait->lock);
>  		spin_unlock_irqrestore(&smc_wait->lock, flags);
> -- 
> 1.8.3.1