Re: [PATCH v2] bluetooth/l2cap: sync sock recv cb and release

Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> · Thu, 20 Jun 2024 12:53:19 -0400

Hi Edward,

On Fri, Jun 14, 2024 at 9:46 PM Edward Adam Davis <eadavis@xxxxxx> wrote:
>
> The problem occurs between the system call to close the sock and hci_rx_work,
> where the former releases the sock and the latter accesses it without lock protection.
>
>            CPU0                       CPU1
>            ----                       ----
>            sock_close                 hci_rx_work
>            l2cap_sock_release         hci_acldata_packet
>            l2cap_sock_kill            l2cap_recv_frame
>            sk_free                    l2cap_conless_channel
>                                       l2cap_sock_recv_cb
>
> If hci_rx_work processes the data that needs to be received before the sock is
> closed, then everything is normal; Otherwise, the work thread may access the
> released sock when receiving data.
>
> Add a chan mutex in the rx callback of the sock to achieve synchronization between
> the sock release and recv cb.
>
> Sock is dead, so set chan data to NULL, avoid others use invalid sock pointer.
>
> Reported-and-tested-by: syzbot+b7f6f8c9303466e16c8a@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Edward Adam Davis <eadavis@xxxxxx>
> ---
>  net/bluetooth/l2cap_sock.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
> index 6db60946c627..f45cdf9bc985 100644
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -1239,6 +1239,10 @@ static void l2cap_sock_kill(struct sock *sk)
>
>         BT_DBG("sk %p state %s", sk, state_to_string(sk->sk_state));
>
> +       /* Sock is dead, so set chan data to NULL, avoid other task use invalid
> +        * sock pointer.
> +        */
> +       l2cap_pi(sk)->chan->data = NULL;
>         /* Kill poor orphan */
>
>         l2cap_chan_put(l2cap_pi(sk)->chan);
> @@ -1481,12 +1485,25 @@ static struct l2cap_chan *l2cap_sock_new_connection_cb(struct l2cap_chan *chan)
>
>  static int l2cap_sock_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb)
>  {
> -       struct sock *sk = chan->data;
> -       struct l2cap_pinfo *pi = l2cap_pi(sk);
> +       struct sock *sk;
> +       struct l2cap_pinfo *pi;
>         int err;
>
> -       lock_sock(sk);
> +       /* To avoid race with sock_release, a chan lock needs to be added here
> +        * to synchronize the sock.
> +        */
> +       l2cap_chan_hold(chan);
> +       l2cap_chan_lock(chan);
> +       sk = chan->data;
>
> +       if (!sk) {
> +               l2cap_chan_unlock(chan);
> +               l2cap_chan_put(chan);
> +               return -ENXIO;
> +       }
> +
> +       pi = l2cap_pi(sk);
> +       lock_sock(sk);
>         if (chan->mode == L2CAP_MODE_ERTM && !list_empty(&pi->rx_busy)) {
>                 err = -ENOMEM;
>                 goto done;
> @@ -1535,6 +1552,8 @@ static int l2cap_sock_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb)
>
>  done:
>         release_sock(sk);
> +       l2cap_chan_unlock(chan);
> +       l2cap_chan_put(chan);
>
>         return err;
>  }
> --
> 2.43.0

Looks like this was never really tested properly:

============================================
WARNING: possible recursive locking detected
6.10.0-rc3-g4029dba6b6f1 #6823 Not tainted
--------------------------------------------
kworker/u5:0/35 is trying to acquire lock:
ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at:
l2cap_sock_recv_cb+0x44/0x1e0

but task is already holding lock:
ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at:
l2cap_get_chan_by_scid+0xaf/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&chan->lock#2/1);
  lock(&chan->lock#2/1);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u5:0/35:
 #0: ffff888002b8a940 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
process_one_work+0x750/0x930
 #1: ffff888002c67dd0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
at: process_one_work+0x44e/0x930
 #2: ffff888002ec2510 (&chan->lock#2/1){+.+.}-{3:3}, at:
l2cap_get_chan_by_scid+0xaf/0xd0

l2cap_sock_recv_cb is assumed to be called with the chan_lock held so
perhaps we can just do:

       sk = chan->data;
       if (!sk)
               return -ENXIO;

-- 
Luiz Augusto von Dentz