Hi Luiz Augusto von Dentz, On Mon, 24 Jun 2024 09:36:14 -0400, Luiz Augusto von Dentz wrote: > > > Looks like this was never really tested properly: > > > > > > ============================================ > > > WARNING: possible recursive locking detected > > > 6.10.0-rc3-g4029dba6b6f1 #6823 Not tainted > > > -------------------------------------------- > > > kworker/u5:0/35 is trying to acquire lock: > > > ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: > > > l2cap_sock_recv_cb+0x44/0x1e0 > > > > > > but task is already holding lock: > > > ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: > > > l2cap_get_chan_by_scid+0xaf/0xd0 > > > > > > other info that might help us debug this: > > > Possible unsafe locking scenario: > > > > > > CPU0 > > > ---- > > > lock(&chan->lock#2/1); > > > lock(&chan->lock#2/1); > > > > > > *** DEADLOCK *** > > > > > > May be due to missing lock nesting notation > > > > > > 3 locks held by kworker/u5:0/35: > > > #0: ffff888002b8a940 ((wq_completion)hci0#2){+.+.}-{0:0}, at: > > > process_one_work+0x750/0x930 > > > #1: ffff888002c67dd0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, > > > at: process_one_work+0x44e/0x930 > > > #2: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: > > > l2cap_get_chan_by_scid+0xaf/0xd0 > > > > > > l2cap_sock_recv_cb is assumed to be called with the chan_lock held so > > > perhaps we can just do: > > > > > > sk = chan->data; > > > if (!sk) > > > return -ENXIO; > > > > If the release occurs after this judgment, the same problem will still occur. > > Recv and release must be synchronized using locks, which can be solved by > > adding new lock. > > > > Please use the new patch https://syzkaller.appspot.com/text?tag=Patch&x=15d2c48e980000, I have tested in > > https://syzkaller.appspot.com/bug?extid=b7f6f8c9303466e16c8a > > Hmm, why don't we just fix l2cap_conless_channel? Btw, > l2cap_conless_channel is normally not used by any profiles thus why > there isn't any CI covering it, on the other hand l2cap_data_channel > is used by 99% of the profiles. Yes, we can fix l2cap_conless_channel, but key point is that "chan->lock" cannot be used to synchronize l2cap_conless_channel and l2cap_sock_release, otherwise it will form an AA lock with l2cap_data_channel. Why not fix it in l2cap_conless_channel but in l2cap_sock_recv_cb, because l2cap_sock_recv_cb is on the same layer as l2cap_sock_kill, using a new lock for synchronization is more appropriate. -- Edward