Hi Luiz, > -----Original Message----- > From: Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> > Sent: Tuesday, December 3, 2024 12:29 AM > To: Iulia Tanasescu <iulia.tanasescu@xxxxxxx> > Cc: linux-bluetooth@xxxxxxxxxxxxxxx; Claudia Cristina Draghicescu > <claudia.rosu@xxxxxxx>; Mihai-Octavian Urzica <mihai- > octavian.urzica@xxxxxxx>; Andrei Istodorescu > <andrei.istodorescu@xxxxxxx> > Subject: Re: [PATCH 1/1] Bluetooth: iso: Allow BIG re-sync > > Caution: This is an external email. Please take care when clicking links or > opening attachments. When in doubt, report the message using the 'Report > this email' button > > > Hi Iulia, > > On Mon, Dec 2, 2024 at 5:00 PM Luiz Augusto von Dentz > <luiz.dentz@xxxxxxxxx> wrote: > > > > Hi Iulia, > > > > On Thu, Nov 28, 2024 at 10:54 AM Iulia Tanasescu > > <iulia.tanasescu@xxxxxxx> wrote: > > > > > > A Broadcast Sink might require BIG sync to be terminated and > > > re-established multiple times, while keeping the same PA sync handle > > > active. This can be possible if the configuration of the listening > > > (PA sync) socket is reset once all bound BISes are established and > > > accepted by the user space: > > > > > > 1. The DEFER setup flag needs to be reset on the parent socket, to > > > allow another BIG create sync procedure to be started on socket > > > read. > > > > > > 2. The BT_SK_BIG_SYNC flag needs to be cleared on the parent socket, > > > to allow another BIG create sync command to be sent. > > > > > > 3. The socket state needs to transition from BT_LISTEN to > > > BT_CONNECTED, to mark that the listening process has completed and > > > another one can be started if needed. > > > > > > Signed-off-by: Iulia Tanasescu <iulia.tanasescu@xxxxxxx> > > > --- > > > net/bluetooth/iso.c | 36 ++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 36 insertions(+) > > > > > > diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index > > > 1b40fd2b2f02..013edb19c4a1 100644 > > > --- a/net/bluetooth/iso.c > > > +++ b/net/bluetooth/iso.c > > > @@ -1268,6 +1268,42 @@ static int iso_sock_accept(struct socket > > > *sock, struct socket *newsock, > > > > > > BT_DBG("new socket %p", ch); > > > > > > + /* A Broadcast Sink might require BIG sync to be terminated > > > + * and re-established multiple times, while keeping the same > > > + * PA sync handle active. To allow this, once all BIS > > > + * connections have been accepted on a PA sync parent socket, > > > + * "reset" socket state, to allow future BIG re-sync procedures. > > > + */ > > > + if (test_bit(BT_SK_PA_SYNC, &iso_pi(sk)->flags)) { > > > + /* Iterate through the list of bound BIS indices > > > + * and clear each BIS as they are accepted by the > > > + * user space, one by one. > > > + */ > > > + for (int i = 0; i < iso_pi(sk)->bc_num_bis; i++) { > > > + if (iso_pi(sk)->bc_bis[i] > 0) { > > > + iso_pi(sk)->bc_bis[i] = 0; > > > + iso_pi(sk)->bc_num_bis--; > > > + break; > > > + } > > > + } > > > + > > > + if (iso_pi(sk)->bc_num_bis == 0) { > > > + /* Once the last BIS was accepted, reset parent > > > + * socket parameters to mark that the listening > > > + * process for BIS connections has been completed: > > > + * > > > + * 1. Reset the DEFER setup flag on the parent sk. > > > + * 2. Clear the flag marking that the BIG create > > > + * sync command is pending. > > > + * 3. Transition socket state from BT_LISTEN to > > > + * BT_CONNECTED. > > > + */ > > > + set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags); > > > + clear_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags); > > > + sk->sk_state = BT_CONNECTED; > > > + } > > > + } > > > + > > > done: > > > release_sock(sk); > > > return err; > > > -- > > > 2.43.0 > > > > While testing this Ive run into the following problem: > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 6.12.0-rc6-g47ebf099106e #7428 Not tainted > > ------------------------------------------------------ > > kworker/u5:2/38 is trying to acquire lock: > > ffff88800224a248 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at: > > iso_connect_cfm+0x563/0x13e0 > > > > but task is already holding lock: > > ffffffffb0438668 (hci_cb_list_lock){+.+.}-{4:4}, at: > > hci_le_per_adv_report_evt+0x494/0x6d0 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (hci_cb_list_lock){+.+.}-{4:4}: > > lock_acquire+0x1b6/0x510 > > __mutex_lock+0x180/0x1a60 > > hci_le_per_adv_report_evt+0x494/0x6d0 > > hci_event_packet+0x54a/0xec0 > > hci_rx_work+0x76c/0x11c0 > > process_one_work+0x7d9/0x13d0 > > worker_thread+0x5b7/0xf90 > > kthread+0x293/0x360 > > ret_from_fork+0x2f/0x70 > > ret_from_fork_asm+0x1a/0x30 > > > > -> #1 (&hdev->lock){+.+.}-{4:4}: > > lock_acquire+0x1b6/0x510 > > __mutex_lock+0x180/0x1a60 > > iso_sock_listen+0x2d4/0xe00 > > __sys_listen+0x163/0x240 > > __x64_sys_listen+0x4e/0x70 > > do_syscall_64+0x71/0x140 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: > > check_prev_add+0x1b5/0x23f0 > > __lock_acquire+0x2bdf/0x4580 > > lock_acquire+0x1b6/0x510 > > lock_sock_nested+0x36/0xd0 > > iso_connect_cfm+0x563/0x13e0 > > hci_le_per_adv_report_evt+0x4f8/0x6d0 > > hci_event_packet+0x54a/0xec0 > > hci_rx_work+0x76c/0x11c0 > > process_one_work+0x7d9/0x13d0 > > worker_thread+0x5b7/0xf90 > > kthread+0x293/0x360 > > ret_from_fork+0x2f/0x70 > > ret_from_fork_asm+0x1a/0x30 > > > > other info that might help us debug this: > > > > Chain exists of: > > sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> &hdev->lock --> > > hci_cb_list_lock > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(hci_cb_list_lock); > > lock(&hdev->lock); > > lock(hci_cb_list_lock); > > lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); > > > > *** DEADLOCK *** > > > > 4 locks held by kworker/u5:2/38: > > #0: ffff888002213948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: > > process_one_work+0xce8/0x13d0 > > #1: ffff8880024a7da0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, > > at: process_one_work+0x758/0x13d0 > > #2: ffff888002300078 (&hdev->lock){+.+.}-{4:4}, at: > > hci_le_per_adv_report_evt+0x152/0x6d0 > > #3: ffffffffb0438668 (hci_cb_list_lock){+.+.}-{4:4}, at: > > hci_le_per_adv_report_evt+0x494/0x6d0 > > > > While this doesn't seem to be new I think it is a good idea to try and > > solve it so we don't run into deadlocks. > > Looks like this is due our usage of a mutex as locking for hci_cb_list_lock, we > could perhaps use a rwlock_t like its done for hci_dev_list_lock but I think we > may still run into possible unsafe locking if there are multiple writers, so > perhaps we could attempt to convert use RCU which does allows us to do > certain operations > lockless: > > https://docs.ke/ > rnel.org%2Fnext%2FRCU%2FlistRCU.html&data=05%7C02%7Ciulia.tanasescu > %40nxp.com%7C493e0b198a26497d5e2f08dd1320bfb8%7C686ea1d3bc2b4c > 6fa92cd99c5c301635%7C0%7C0%7C638687753582549789%7CUnknown%7CT > WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXa > W4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Eyu7 > %2BLOcRhCHYA7wEpkikFHsUgJQKFHRAZlPFXcPn8s%3D&reserved=0 > Even if I use RCU instead of mutex, I still get the following warning: [ 75.307983] ====================================================== [ 75.307984] WARNING: possible circular locking dependency detected [ 75.307985] 6.12.0-rc6+ #22 Not tainted [ 75.307987] ------------------------------------------------------ [ 75.307987] kworker/u81:2/2623 is trying to acquire lock: [ 75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO) at: iso_connect_cfm+0x253/0x840 [bluetooth] [ 75.308021] but task is already holding lock: [ 75.308022] ffff8fdd61a10078 (&hdev->lock) hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth] [ 75.308053] which lock already depends on the new lock. [ 75.308054] the existing dependency chain (in reverse order) is: [ 75.308055] -> #1 (&hdev->lock){+.+.}-{3:3}: [ 75.308057] __mutex_lock+0xad/0xc50 [ 75.308061] mutex_lock_nested+0x1b/0x30 [ 75.308063] iso_sock_listen+0x143/0x5c0 [bluetooth] [ 75.308085] __sys_listen_socket+0x49/0x60 [ 75.308088] __x64_sys_listen+0x4c/0x90 [ 75.308090] x64_sys_call+0x2517/0x25f0 [ 75.308092] do_syscall_64+0x87/0x150 [ 75.308095] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 75.308098] -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: [ 75.308100] __lock_acquire+0x155e/0x25f0 [ 75.308103] lock_acquire+0xc9/0x300 [ 75.308105] lock_sock_nested+0x32/0x90 [ 75.308107] iso_connect_cfm+0x253/0x840 [bluetooth] [ 75.308128] hci_connect_cfm+0x6c/0x190 [bluetooth] [ 75.308155] hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth] [ 75.308180] hci_le_meta_evt+0xe7/0x200 [bluetooth] [ 75.308206] hci_event_packet+0x21f/0x5c0 [bluetooth] [ 75.308230] hci_rx_work+0x3ae/0xb10 [bluetooth] [ 75.308254] process_one_work+0x212/0x740 [ 75.308256] worker_thread+0x1bd/0x3a0 [ 75.308258] kthread+0xe4/0x120 [ 75.308259] ret_from_fork+0x44/0x70 [ 75.308261] ret_from_fork_asm+0x1a/0x30 [ 75.308263] other info that might help us debug this: [ 75.308264] Possible unsafe locking scenario: [ 75.308264] CPU0 CPU1 [ 75.308265] ---- ---- [ 75.308265] lock(&hdev->lock); [ 75.308267] lock(sk_lock- AF_BLUETOOTH-BTPROTO_ISO); [ 75.308268] lock(&hdev->lock); [ 75.308269] lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); [ 75.308270] *** DEADLOCK *** [ 75.308271] 4 locks held by kworker/u81:2/2623: [ 75.308272] #0: ffff8fdd66e52148 ((wq_completion)hci0#2) at: process_one_work+0x443/0x740 [ 75.308276] #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)) at: process_one_work+0x1ce/0x740 [ 75.308280] #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3}, at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth] [ 75.308304] #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2}, at: hci_connect_cfm+0x29/0x190 [bluetooth] I think the root issue comes from an A->B/B->A type lock order with hdev and sk. I submitted a patch series with a fix for this, and also 2 other issues that I found: https://patchwork.kernel.org/project/bluetooth/cover/20241204122849.9000-1-iulia.tanasescu@xxxxxxx/ > > > > -- > > Luiz Augusto von Dentz > > > > -- > Luiz Augusto von Dentz Regards, Iulia