> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Saturday, October 5, 2024 3:21 AM > To: Bernard Metzler <BMT@xxxxxxxxxxxxxx> > Cc: leon@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx > Subject: [EXTERNAL] Re: [syzbot] [rdma?] possible deadlock in > siw_create_listen (2) > > On Fri, Oct 04, 2024 at 04:10:31PM +0000, Bernard Metzler wrote: > > > Could one please help me to understand this situation? > > cma.c:5354 > > > > mutex_lock(&lock); > > list_add_tail(&cma_dev->list, &dev_list); > > list_for_each_entry(id_priv, &listen_any_list, listen_any_item) { > > ret = cma_listen_on_dev(id_priv, cma_dev, &to_destroy); > > if (ret) > > goto free_listen; > > } > > mutex_unlock(&lock); > > > > siw_cm.c:1776 > > sock_set_reuseaddr(s->sk); > > > > ...which calls lock_sock(sk) on a feshly created socket. > > I think this is a smc bug, and lockdep is getting confused about what > to report due to all the different locks. > > smc_setsockopt() eventually in ip_setsockopt() does: > > mutex_lock(&smc->clcsock_release_lock); > > if (needs_rtnl) > rtnl_lock(); > sockopt_lock_sock(sk); > mutex_unlock(&smc->clcsock_release_lock); > > > smc_sendmsg() does > > lock_sock(sk); > mutex_lock(&smc->clcsock_release_lock); > > Which is classic deadlock locking. > Thank you for helping to clarify this. That would make much more sense. So blaming > siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776 ... isn't quite right. It doesn't deal with the SMC lock, but locks a just created socket via >> -> #0 (sk_lock-AF_INET){+.+.}-{0:0}: >> check_prev_add kernel/locking/lockdep.c:3133 [inline] >> check_prevs_add kernel/locking/lockdep.c:3252 [inline] >> validate_chain kernel/locking/lockdep.c:3868 [inline] >> __lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142 >> lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759 >> lock_sock_nested net/core/sock.c:3543 [inline] >> lock_sock include/net/sock.h:1607 [inline] >> sock_set_reuseaddr+0x58/0x154 net/core/sock.c:782 >> siw_create_listen+0x164/0xd70 > That the CMA gets involved here seems like wrong reporting because > syzkaller put those lock chains into it. > > I guess this is a dup of > > INVALID URI REMOVED > 3A__lore.kernel.org_netdev_00000000000093078f0622583e6e- > 40google.com_T_&d=DwIBAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tY > SbqxyOwdSiLedP4yO55g&m=JpX-DX-70KCh-9MzDE4Yt0wOtrMj03iWWukt_A_7qB2ycm- > IeacSCUUDTQ5MS24-&s=DQc776KI863HX_sKom7kci4ykIgXdN7skIMVbWS1Hjc&e= > > Or at least that should be fixed before looking at this > Sounds reasonable... Thanks! Bernard.