Re: [PATCH 1/1] Bluetooth: iso: Allow BIG re-sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Luiz,

> -----Original Message-----
> From: Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx>
> Sent: Tuesday, December 3, 2024 12:29 AM
> To: Iulia Tanasescu <iulia.tanasescu@xxxxxxx>
> Cc: linux-bluetooth@xxxxxxxxxxxxxxx; Claudia Cristina Draghicescu
> <claudia.rosu@xxxxxxx>; Mihai-Octavian Urzica <mihai-
> octavian.urzica@xxxxxxx>; Andrei Istodorescu
> <andrei.istodorescu@xxxxxxx>
> Subject: Re: [PATCH 1/1] Bluetooth: iso: Allow BIG re-sync
> 
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report
> this email' button
> 
> 
> Hi Iulia,
> 
> On Mon, Dec 2, 2024 at 5:00 PM Luiz Augusto von Dentz
> <luiz.dentz@xxxxxxxxx> wrote:
> >
> > Hi Iulia,
> >
> > On Thu, Nov 28, 2024 at 10:54 AM Iulia Tanasescu
> > <iulia.tanasescu@xxxxxxx> wrote:
> > >
> > > A Broadcast Sink might require BIG sync to be terminated and
> > > re-established multiple times, while keeping the same PA sync handle
> > > active. This can be possible if the configuration of the listening
> > > (PA sync) socket is reset once all bound BISes are established and
> > > accepted by the user space:
> > >
> > > 1. The DEFER setup flag needs to be reset on the parent socket, to
> > > allow another BIG create sync procedure to be started on socket
> > > read.
> > >
> > > 2. The BT_SK_BIG_SYNC flag needs to be cleared on the parent socket,
> > > to allow another BIG create sync command to be sent.
> > >
> > > 3. The socket state needs to transition from BT_LISTEN to
> > > BT_CONNECTED, to mark that the listening process has completed and
> > > another one can be started if needed.
> > >
> > > Signed-off-by: Iulia Tanasescu <iulia.tanasescu@xxxxxxx>
> > > ---
> > >  net/bluetooth/iso.c | 36 ++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 36 insertions(+)
> > >
> > > diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index
> > > 1b40fd2b2f02..013edb19c4a1 100644
> > > --- a/net/bluetooth/iso.c
> > > +++ b/net/bluetooth/iso.c
> > > @@ -1268,6 +1268,42 @@ static int iso_sock_accept(struct socket
> > > *sock, struct socket *newsock,
> > >
> > >         BT_DBG("new socket %p", ch);
> > >
> > > +       /* A Broadcast Sink might require BIG sync to be terminated
> > > +        * and re-established multiple times, while keeping the same
> > > +        * PA sync handle active. To allow this, once all BIS
> > > +        * connections have been accepted on a PA sync parent socket,
> > > +        * "reset" socket state, to allow future BIG re-sync procedures.
> > > +        */
> > > +       if (test_bit(BT_SK_PA_SYNC, &iso_pi(sk)->flags)) {
> > > +               /* Iterate through the list of bound BIS indices
> > > +                * and clear each BIS as they are accepted by the
> > > +                * user space, one by one.
> > > +                */
> > > +               for (int i = 0; i < iso_pi(sk)->bc_num_bis; i++) {
> > > +                       if (iso_pi(sk)->bc_bis[i] > 0) {
> > > +                               iso_pi(sk)->bc_bis[i] = 0;
> > > +                               iso_pi(sk)->bc_num_bis--;
> > > +                               break;
> > > +                       }
> > > +               }
> > > +
> > > +               if (iso_pi(sk)->bc_num_bis == 0) {
> > > +                       /* Once the last BIS was accepted, reset parent
> > > +                        * socket parameters to mark that the listening
> > > +                        * process for BIS connections has been completed:
> > > +                        *
> > > +                        * 1. Reset the DEFER setup flag on the parent sk.
> > > +                        * 2. Clear the flag marking that the BIG create
> > > +                        *    sync command is pending.
> > > +                        * 3. Transition socket state from BT_LISTEN to
> > > +                        *    BT_CONNECTED.
> > > +                        */
> > > +                       set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags);
> > > +                       clear_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags);
> > > +                       sk->sk_state = BT_CONNECTED;
> > > +               }
> > > +       }
> > > +
> > >  done:
> > >         release_sock(sk);
> > >         return err;
> > > --
> > > 2.43.0
> >
> > While testing this Ive run into the following problem:
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.12.0-rc6-g47ebf099106e #7428 Not tainted
> > ------------------------------------------------------
> > kworker/u5:2/38 is trying to acquire lock:
> > ffff88800224a248 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
> > iso_connect_cfm+0x563/0x13e0
> >
> > but task is already holding lock:
> > ffffffffb0438668 (hci_cb_list_lock){+.+.}-{4:4}, at:
> > hci_le_per_adv_report_evt+0x494/0x6d0
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (hci_cb_list_lock){+.+.}-{4:4}:
> >        lock_acquire+0x1b6/0x510
> >        __mutex_lock+0x180/0x1a60
> >        hci_le_per_adv_report_evt+0x494/0x6d0
> >        hci_event_packet+0x54a/0xec0
> >        hci_rx_work+0x76c/0x11c0
> >        process_one_work+0x7d9/0x13d0
> >        worker_thread+0x5b7/0xf90
> >        kthread+0x293/0x360
> >        ret_from_fork+0x2f/0x70
> >        ret_from_fork_asm+0x1a/0x30
> >
> > -> #1 (&hdev->lock){+.+.}-{4:4}:
> >        lock_acquire+0x1b6/0x510
> >        __mutex_lock+0x180/0x1a60
> >        iso_sock_listen+0x2d4/0xe00
> >        __sys_listen+0x163/0x240
> >        __x64_sys_listen+0x4e/0x70
> >        do_syscall_64+0x71/0x140
> >        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
> >        check_prev_add+0x1b5/0x23f0
> >        __lock_acquire+0x2bdf/0x4580
> >        lock_acquire+0x1b6/0x510
> >        lock_sock_nested+0x36/0xd0
> >        iso_connect_cfm+0x563/0x13e0
> >        hci_le_per_adv_report_evt+0x4f8/0x6d0
> >        hci_event_packet+0x54a/0xec0
> >        hci_rx_work+0x76c/0x11c0
> >        process_one_work+0x7d9/0x13d0
> >        worker_thread+0x5b7/0xf90
> >        kthread+0x293/0x360
> >        ret_from_fork+0x2f/0x70
> >        ret_from_fork_asm+0x1a/0x30
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> >   sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> &hdev->lock -->
> > hci_cb_list_lock
> >
> >  Possible unsafe locking scenario:
> >
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(hci_cb_list_lock);
> >                                lock(&hdev->lock);
> >                                lock(hci_cb_list_lock);
> >   lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
> >
> >  *** DEADLOCK ***
> >
> > 4 locks held by kworker/u5:2/38:
> >  #0: ffff888002213948 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
> > process_one_work+0xce8/0x13d0
> >  #1: ffff8880024a7da0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
> > at: process_one_work+0x758/0x13d0
> >  #2: ffff888002300078 (&hdev->lock){+.+.}-{4:4}, at:
> > hci_le_per_adv_report_evt+0x152/0x6d0
> >  #3: ffffffffb0438668 (hci_cb_list_lock){+.+.}-{4:4}, at:
> > hci_le_per_adv_report_evt+0x494/0x6d0
> >
> > While this doesn't seem to be new I think it is a good idea to try and
> > solve it so we don't run into deadlocks.
> 
> Looks like this is due our usage of a mutex as locking for hci_cb_list_lock, we
> could perhaps use a rwlock_t like its done for hci_dev_list_lock but I think we
> may still run into possible unsafe locking if there are multiple writers, so
> perhaps we could attempt to convert use RCU which does allows us to do
> certain operations
> lockless:
> 
> https://docs.ke/
> rnel.org%2Fnext%2FRCU%2FlistRCU.html&data=05%7C02%7Ciulia.tanasescu
> %40nxp.com%7C493e0b198a26497d5e2f08dd1320bfb8%7C686ea1d3bc2b4c
> 6fa92cd99c5c301635%7C0%7C0%7C638687753582549789%7CUnknown%7CT
> WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXa
> W4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Eyu7
> %2BLOcRhCHYA7wEpkikFHsUgJQKFHRAZlPFXcPn8s%3D&reserved=0
>

Even if I use RCU instead of mutex, I still get the following warning:

[   75.307983] ======================================================
[   75.307984] WARNING: possible circular locking dependency detected
[   75.307985] 6.12.0-rc6+ #22 Not tainted
[   75.307987] ------------------------------------------------------
[   75.307987] kworker/u81:2/2623 is trying to acquire lock:
[   75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO)
               at: iso_connect_cfm+0x253/0x840 [bluetooth]
[   75.308021] 
               but task is already holding lock:
[   75.308022] ffff8fdd61a10078 (&hdev->lock)
               hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[   75.308053] 
               which lock already depends on the new lock.
 
[   75.308054] 
               the existing dependency chain (in reverse order) is:
[   75.308055] 
               -> #1 (&hdev->lock){+.+.}-{3:3}:
[   75.308057]        __mutex_lock+0xad/0xc50
[   75.308061]        mutex_lock_nested+0x1b/0x30
[   75.308063]        iso_sock_listen+0x143/0x5c0 [bluetooth]
[   75.308085]        __sys_listen_socket+0x49/0x60
[   75.308088]        __x64_sys_listen+0x4c/0x90
[   75.308090]        x64_sys_call+0x2517/0x25f0
[   75.308092]        do_syscall_64+0x87/0x150
[   75.308095]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   75.308098] 
               -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[   75.308100]        __lock_acquire+0x155e/0x25f0
[   75.308103]        lock_acquire+0xc9/0x300
[   75.308105]        lock_sock_nested+0x32/0x90
[   75.308107]        iso_connect_cfm+0x253/0x840 [bluetooth]
[   75.308128]        hci_connect_cfm+0x6c/0x190 [bluetooth]
[   75.308155]        hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth]
[   75.308180]        hci_le_meta_evt+0xe7/0x200 [bluetooth]
[   75.308206]        hci_event_packet+0x21f/0x5c0 [bluetooth]
[   75.308230]        hci_rx_work+0x3ae/0xb10 [bluetooth]
[   75.308254]        process_one_work+0x212/0x740
[   75.308256]        worker_thread+0x1bd/0x3a0
[   75.308258]        kthread+0xe4/0x120
[   75.308259]        ret_from_fork+0x44/0x70
[   75.308261]        ret_from_fork_asm+0x1a/0x30
[   75.308263] 
               other info that might help us debug this:
 
[   75.308264]  Possible unsafe locking scenario:
 
[   75.308264]        CPU0             CPU1
[   75.308265]        ----             ----
[   75.308265]   lock(&hdev->lock);
[   75.308267]                         lock(sk_lock-
                                            AF_BLUETOOTH-BTPROTO_ISO);
[   75.308268]                         lock(&hdev->lock);
[   75.308269]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
[   75.308270] 
                *** DEADLOCK ***
 
[   75.308271] 4 locks held by kworker/u81:2/2623:
[   75.308272]  #0: ffff8fdd66e52148 ((wq_completion)hci0#2)
                at: process_one_work+0x443/0x740
[   75.308276]  #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work))
                at: process_one_work+0x1ce/0x740
[   75.308280]  #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3},
                at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[   75.308304]  #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2},
                at: hci_connect_cfm+0x29/0x190 [bluetooth]

I think the root issue comes from an A->B/B->A type lock order with hdev
and sk. I submitted a patch series with a fix for this, and also
2 other issues that I found:
https://patchwork.kernel.org/project/bluetooth/cover/20241204122849.9000-1-iulia.tanasescu@xxxxxxx/

> >
> > --
> > Luiz Augusto von Dentz
> 
> 
> 
> --
> Luiz Augusto von Dentz

Regards,
Iulia




[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux