Hi Edward, On Fri, Jun 14, 2024 at 9:46 PM Edward Adam Davis <eadavis@xxxxxx> wrote: > > The problem occurs between the system call to close the sock and hci_rx_work, > where the former releases the sock and the latter accesses it without lock protection. > > CPU0 CPU1 > ---- ---- > sock_close hci_rx_work > l2cap_sock_release hci_acldata_packet > l2cap_sock_kill l2cap_recv_frame > sk_free l2cap_conless_channel > l2cap_sock_recv_cb > > If hci_rx_work processes the data that needs to be received before the sock is > closed, then everything is normal; Otherwise, the work thread may access the > released sock when receiving data. > > Add a chan mutex in the rx callback of the sock to achieve synchronization between > the sock release and recv cb. > > Sock is dead, so set chan data to NULL, avoid others use invalid sock pointer. > > Reported-and-tested-by: syzbot+b7f6f8c9303466e16c8a@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Edward Adam Davis <eadavis@xxxxxx> > --- > net/bluetooth/l2cap_sock.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c > index 6db60946c627..f45cdf9bc985 100644 > --- a/net/bluetooth/l2cap_sock.c > +++ b/net/bluetooth/l2cap_sock.c > @@ -1239,6 +1239,10 @@ static void l2cap_sock_kill(struct sock *sk) > > BT_DBG("sk %p state %s", sk, state_to_string(sk->sk_state)); > > + /* Sock is dead, so set chan data to NULL, avoid other task use invalid > + * sock pointer. > + */ > + l2cap_pi(sk)->chan->data = NULL; > /* Kill poor orphan */ > > l2cap_chan_put(l2cap_pi(sk)->chan); > @@ -1481,12 +1485,25 @@ static struct l2cap_chan *l2cap_sock_new_connection_cb(struct l2cap_chan *chan) > > static int l2cap_sock_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb) > { > - struct sock *sk = chan->data; > - struct l2cap_pinfo *pi = l2cap_pi(sk); > + struct sock *sk; > + struct l2cap_pinfo *pi; > int err; > > - lock_sock(sk); > + /* To avoid race with sock_release, a chan lock needs to be added here > + * to synchronize the sock. > + */ > + l2cap_chan_hold(chan); > + l2cap_chan_lock(chan); > + sk = chan->data; > > + if (!sk) { > + l2cap_chan_unlock(chan); > + l2cap_chan_put(chan); > + return -ENXIO; > + } > + > + pi = l2cap_pi(sk); > + lock_sock(sk); > if (chan->mode == L2CAP_MODE_ERTM && !list_empty(&pi->rx_busy)) { > err = -ENOMEM; > goto done; > @@ -1535,6 +1552,8 @@ static int l2cap_sock_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb) > > done: > release_sock(sk); > + l2cap_chan_unlock(chan); > + l2cap_chan_put(chan); > > return err; > } > -- > 2.43.0 Looks like this was never really tested properly: ============================================ WARNING: possible recursive locking detected 6.10.0-rc3-g4029dba6b6f1 #6823 Not tainted -------------------------------------------- kworker/u5:0/35 is trying to acquire lock: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: l2cap_sock_recv_cb+0x44/0x1e0 but task is already holding lock: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: l2cap_get_chan_by_scid+0xaf/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&chan->lock#2/1); lock(&chan->lock#2/1); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u5:0/35: #0: ffff888002b8a940 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x750/0x930 #1: ffff888002c67dd0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work+0x44e/0x930 #2: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at: l2cap_get_chan_by_scid+0xaf/0xd0 l2cap_sock_recv_cb is assumed to be called with the chan_lock held so perhaps we can just do: sk = chan->data; if (!sk) return -ENXIO; -- Luiz Augusto von Dentz