Hi Jonas, On Fri, Feb 9, 2024 at 8:36 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote: > > Hi Jonas, > > On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <verdre@xxxxxxx> wrote: > > > > Hi everyone! > > > > On 08.02.24 16:32, syzbot wrote: > > > syzbot has bisected this issue to: > > > > > > commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b > > > Author: Jonas Dreßler <verdre@xxxxxxx> > > > Date: Tue Feb 6 11:08:13 2024 +0000 > > > > > > Bluetooth: hci_conn: Only do ACL connections sequentially > > > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000 > > > start commit: b1d3a0e70c38 Add linux-next specific files for 20240208 > > > git tree: linux-next > > > final oops: https://syzkaller.appspot.com/x/report.txt?x=174f8550180000 > > > console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000 > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000 > > > > > > Reported-by: syzbot+3f0a39be7a2035700868@xxxxxxxxxxxxxxxxxxxxxxxxx > > > Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially") > > > > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > > > > Hmm, looking at the backtraces, I think the issue that the introduction of the > > sequential connect has introduced another async case: In hci_connect_acl(), when > > we call hci_acl_create_connection_sync(), the conn state no longer immediately > > gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync > > queue actually executes __hci_acl_create_connection_sync(). > > Need to double check but I think we do set BT_CONNECT in case of LE > when it is queued so which shall prevent it to be queued multiple > times. > > > This means that now hci_connect_acl() is happy to do multiple > > hci_acl_create_connection_sync calls, and the hci_sync machinery will happily > > execute them right after each other. Then the newly introduced hci_abort_conn_sync() > > in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn > > object, so the second time we enter __hci_acl_create_connection_sync(), > > things blow up. > > > > It looks to me like in theory the hci_connect_le_sync() logic is prone to a > > similar issue, but in practice that's prohibited because in hci_connect_le_sync() > > we lookup whether the conn object still exists and bail out if it doesn't. > > > > Even for LE though I think we can queue multiple hci_connect_le_sync() calls > > and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection > > state actually is? > > > > So assuming this analysis is correct, what do we do to fix this? It seems to me > > that > > > > 1) we want a BT_CONNECT_QUEUED state for connections, so that the state > > machine covers this additional stage that we have for ACL and LE connections now. > > BT_CONNECT already indicates that connection procedure is in progress. > > > 2) the conn object can still disappear while the __hci_acl_create_connection_sync() > > is queued, so we need something like the "if conn doesn't exist anymore, bail out" > > check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too. > > Btw, I'd probably clean up the connect function and create something > like hci_connect/hci_connect_sync which takes care of the details > internally like it was done to abort. > > > That said, the current check in hci_connect_le_sync() that's using the connection > > handle to lookup the conn does not seem great, aren't these handles re-used > > after connections are torn down? > > Well we could perhaps do a lookup by pointer to see if the connection > hasn't been removed in the meantime, that said to force a clash on the > handles it need to happen in between abort, which frees the handle, > and connect, anyway the real culprit here is that we should be able to > abort the cmd_sync callback like we do in LE: > > https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943 > > That way we stop the connect callback to run and don't have to worry > about handle re-use. https://patchwork.kernel.org/project/bluetooth/patch/20240209141612.3554051-1-luiz.dentz@xxxxxxxxx/ > > > Cheers, > > Jonas > > > > -- > Luiz Augusto von Dentz -- Luiz Augusto von Dentz