Re: [PATCH 1/1] Bluetooth: iso: Allow BIG re-sync

Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> · Mon, 2 Dec 2024 17:28:59 -0500

Hi Iulia,

On Mon, Dec 2, 2024 at 5:00 PM Luiz Augusto von Dentz
<luiz.dentz@xxxxxxxxx> wrote:
>
> Hi Iulia,
>
> On Thu, Nov 28, 2024 at 10:54 AM Iulia Tanasescu
> <iulia.tanasescu@xxxxxxx> wrote:
> >
> > A Broadcast Sink might require BIG sync to be terminated and
> > re-established multiple times, while keeping the same PA sync
> > handle active. This can be possible if the configuration of the
> > listening (PA sync) socket is reset once all bound BISes are
> > established and accepted by the user space:
> >
> > 1. The DEFER setup flag needs to be reset on the parent socket,
> > to allow another BIG create sync procedure to be started on socket
> > read.
> >
> > 2. The BT_SK_BIG_SYNC flag needs to be cleared on the parent socket,
> > to allow another BIG create sync command to be sent.
> >
> > 3. The socket state needs to transition from BT_LISTEN to BT_CONNECTED,
> > to mark that the listening process has completed and another one can
> > be started if needed.
> >
> > Signed-off-by: Iulia Tanasescu <iulia.tanasescu@xxxxxxx>
> > ---
> >  net/bluetooth/iso.c | 36 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 36 insertions(+)
> >
> > diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
> > index 1b40fd2b2f02..013edb19c4a1 100644
> > --- a/net/bluetooth/iso.c
> > +++ b/net/bluetooth/iso.c
> > @@ -1268,6 +1268,42 @@ static int iso_sock_accept(struct socket *sock, struct socket *newsock,
> >
> >         BT_DBG("new socket %p", ch);
> >
> > +       /* A Broadcast Sink might require BIG sync to be terminated
> > +        * and re-established multiple times, while keeping the same
> > +        * PA sync handle active. To allow this, once all BIS
> > +        * connections have been accepted on a PA sync parent socket,
> > +        * "reset" socket state, to allow future BIG re-sync procedures.
> > +        */
> > +       if (test_bit(BT_SK_PA_SYNC, &iso_pi(sk)->flags)) {
> > +               /* Iterate through the list of bound BIS indices
> > +                * and clear each BIS as they are accepted by the
> > +                * user space, one by one.
> > +                */
> > +               for (int i = 0; i < iso_pi(sk)->bc_num_bis; i++) {
> > +                       if (iso_pi(sk)->bc_bis[i] > 0) {
> > +                               iso_pi(sk)->bc_bis[i] = 0;
> > +                               iso_pi(sk)->bc_num_bis--;
> > +                               break;
> > +                       }
> > +               }
> > +
> > +               if (iso_pi(sk)->bc_num_bis == 0) {
> > +                       /* Once the last BIS was accepted, reset parent
> > +                        * socket parameters to mark that the listening
> > +                        * process for BIS connections has been completed:
> > +                        *
> > +                        * 1. Reset the DEFER setup flag on the parent sk.
> > +                        * 2. Clear the flag marking that the BIG create
> > +                        *    sync command is pending.
> > +                        * 3. Transition socket state from BT_LISTEN to
> > +                        *    BT_CONNECTED.
> > +                        */
> > +                       set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
> > +                       clear_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags);
> > +                       sk->sk_state = BT_CONNECTED;
> > +               }
> > +       }
> > +
> >  done:
> >         release_sock(sk);
> >         return err;
> > --
> > 2.43.0
>
> While testing this Ive run into the following problem:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.12.0-rc6-g47ebf099106e #7428 Not tainted
> ------------------------------------------------------
> kworker/u5:2/38 is trying to acquire lock:
> ffff88800224a248 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
> iso_connect_cfm+0x563/0x13e0
>
> but task is already holding lock:
> ffffffffb0438668 (hci_cb_list_lock){+.+.}-{4:4}, at:
> hci_le_per_adv_report_evt+0x494/0x6d0
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (hci_cb_list_lock){+.+.}-{4:4}:
>        lock_acquire+0x1b6/0x510
>        __mutex_lock+0x180/0x1a60
>        hci_le_per_adv_report_evt+0x494/0x6d0
>        hci_event_packet+0x54a/0xec0
>        hci_rx_work+0x76c/0x11c0
>        process_one_work+0x7d9/0x13d0
>        worker_thread+0x5b7/0xf90
>        kthread+0x293/0x360
>        ret_from_fork+0x2f/0x70
>        ret_from_fork_asm+0x1a/0x30
>
> -> #1 (&hdev->lock){+.+.}-{4:4}:
>        lock_acquire+0x1b6/0x510
>        __mutex_lock+0x180/0x1a60
>        iso_sock_listen+0x2d4/0xe00
>        __sys_listen+0x163/0x240
>        __x64_sys_listen+0x4e/0x70
>        do_syscall_64+0x71/0x140
>        entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
>        check_prev_add+0x1b5/0x23f0
>        __lock_acquire+0x2bdf/0x4580
>        lock_acquire+0x1b6/0x510
>        lock_sock_nested+0x36/0xd0
>        iso_connect_cfm+0x563/0x13e0
>        hci_le_per_adv_report_evt+0x4f8/0x6d0
>        hci_event_packet+0x54a/0xec0
>        hci_rx_work+0x76c/0x11c0
>        process_one_work+0x7d9/0x13d0
>        worker_thread+0x5b7/0xf90
>        kthread+0x293/0x360
>        ret_from_fork+0x2f/0x70
>        ret_from_fork_asm+0x1a/0x30
>
> other info that might help us debug this:
>
> Chain exists of:
>   sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> &hdev->lock --> hci_cb_list_lock
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(hci_cb_list_lock);
>                                lock(&hdev->lock);
>                                lock(hci_cb_list_lock);
>   lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
>
>  *** DEADLOCK ***
>
> 4 locks held by kworker/u5:2/38:
>  #0: ffff888002213948 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
> process_one_work+0xce8/0x13d0
>  #1: ffff8880024a7da0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
> at: process_one_work+0x758/0x13d0
>  #2: ffff888002300078 (&hdev->lock){+.+.}-{4:4}, at:
> hci_le_per_adv_report_evt+0x152/0x6d0
>  #3: ffffffffb0438668 (hci_cb_list_lock){+.+.}-{4:4}, at:
> hci_le_per_adv_report_evt+0x494/0x6d0
>
> While this doesn't seem to be new I think it is a good idea to try and
> solve it so we don't run into deadlocks.

Looks like this is due our usage of a mutex as locking for
hci_cb_list_lock, we could perhaps use a rwlock_t like its done for
hci_dev_list_lock but I think we may still run into possible unsafe
locking if there are multiple writers, so perhaps we could attempt to
convert use RCU which does allows us to do certain operations
lockless:

https://docs.kernel.org/next/RCU/listRCU.html

>
> --
> Luiz Augusto von Dentz

-- 
Luiz Augusto von Dentz