Hi Olivier, On Mon, Sep 4, 2023 at 3:12 PM Olivier L'Heureux <olivier.lheureux@xxxxxxx> wrote: > > Request for Comments > > Summary > ======= > > We have found a memory leak in the Bluetooth L2CAP subsystem and fixed > it, but this opened the doors to use-after-free problems, which are > not completely fixed yet. This patch series present a way to reproduce > it, the proposed fix and the status. There is more detailed > documentation in [1]. > > Memory Leak Overview > ==================== > > We have found a memory leak in the L2CAP layer of Linux's Bluetooth > Subsystem, the same as reported by syzbot in "[syzbot] [bluetooth?] > memory leak in hci_conn_add (2)" (2023-09-02 23:25:00 -0700) [19]. > > We can reproduce it. When, in a loop, a user-mode program: > > 1. Opens a Bluetooth socket at the L2CAP layer: > > sockfd = socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP); > > 2. Set a timeout on the socket: > > timeout.tv_sec = 2; > timeout.tv_usec = 0; > > err = setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(timeout)); > > 3. Connect to a specific Bluetooth address: > > struct sockaddr_l2 ble_addr; > > memset(&ble_addr, 0, sizeof(ble_addr)); > ble_addr.l2_family = AF_BLUETOOTH; > ble_addr.l2_psm = htobs(0x80 /* L2CAP_PSM_LE_DYN_START */); > // l2_bdaddr=00:0a:07:a3:22:00 > ble_addr.l2_bdaddr = *(bdaddr_t*)("\x00\x0a\x07\xa3\x22\x00"); > ble_addr.l2_cid = htobs(0); > ble_addr.l2_bdaddr_type = BDADDR_LE_PUBLIC; > > err = connect(sockfd, (struct sockaddr*)&ble_addr, sizeof(ble_addr)); > > and when the user program does those three steps in a loop, then the > kernel leaks the "struct l2cap_conn" [5] and "struct hci_conn" [6] > objects allocated in "l2cap_conn_add()" [2] and "hci_conn_add()" [3]. > Those objects are never freed. > > The "ble-memleak-repro" program reproduces the memory leak, if the > kernel is not patched. Its source is in > "package/ble-memleak-repro/ble-memleak-repro.c" [18]. We should probably create a test in l2cap-tester using SO_SNDTIMEO first, so we can make sure CI test such case and we are able to detect if the problem is reintroduced later: https://github.com/bluez/bluez/blob/master/tools/l2cap-tester.c > Memory Leak Fix > =============== > > We have fixed the memory leak, see the patch series in > "patches/linux/": > > 1. The first patch > "0001-ARM-dts-stm32-Add-Bluetooth-usart2-feature-on-stm32m.patch" [11] > enables Bluetooth on the DK2. This above should probably be sent separately. > 2. The second patch > "0002-Bluetooth-add-many-traces-for-allocation-free-refcou.patch" [12] > adds many dynamic debug traces that help understand the kernel > behaviour and freeing problems. I'm fine with this change, but we better use the likes of bt_dev_dbg instead of BT_DBG. > 3. Patches > "0003-Bluetooth-L2CAP-use-refcount-on-struct-l2cap_chan-co.patch" [13] > and > "0004-Bluetooth-L2CAP-free-the-leaking-struct-l2cap_conn.patch" [14] > fix the memory leak. > 4. Patches > "0005-Bluetooth-introduce-hci_conn_free-for-better-structu.patch" [15], > "0006-Bluetooth-L2CAP-inc-refcount-if-reuse-struct-l2cap_c.patch" [16] > and > "0007-Bluetooth-unlink-objects-to-avoid-use-after-free.patch" [17] > fixes the use-after-free that appears when the "struct l2cap_conn" > [5] and "struct hci_conn" [6] objects are freed. These I'm not very comfortable applying as they are, I'm afraid there could be regressions if they are not accompanied with proper tests, besides I think there are less intrusive ways to cleanup l2cap_conn, see below. > The commit messages explain why the commit is needed, which problem > the commit solves, and how. > > The first and second patches are present for the memory leak > reproduction only, they should not be part of a final fix. > > Patch Status > ============ > > As of writing, the memory leak is fixed. The fix opened the door to > other problems, especially use-after-free, sometimes followed by > crashes due to NULL dereferences. We think there are weak references > (i.e. pointers that don't increment the refcounter) previously > pointing to memory that was never freed, but now is freed. On kernels > v5.13 and v5.15, patches 0005, 0006 and 0007 seem to fix the > use-after-free and NULL dereferences, but not on kernel v6.5 yet. I think the problem is that the lifetime of l2cap_conn shall be hooked with hci_chan, but the likes of hci_chan_list_flush -> hci_chan_del don't actually trigger l2cap_conn_del, which imo is the culprit here, because connect_cfm/disconnect_cfm is not called when the connection procedure has been aborted prematurely, so perhaps we shall get rid of the likes of l2cap_connect_cfm/l2cap_disconn_cfm and instead do the cleanup with in the following order: hci_conn_cleanup->hci_chan_list_flush->hci_chan_del->l2cap_conn_del, that way we avoid having multiple code paths attempting to cleanup objects associated with hci_conn/hci_chan. I'd change hci_chan_create to take a del callback to avoid having circular dependencies on one another though. > Reproducing with Buildroot > ========================== > > We have reproduced and fixed the memory leak with Buildroot [7] and a > "ble-memleak-repro" test application on an ST's Discovery Kit DK2 [4]. > > The "ble-memleak-repro" repository [1] contains the sources of a > complete external Buildroot customisation [8], with the patches, a > README, and more explanations to reproduce the problem with Buildroot > on a DK2. > > References: > =========== > > - [1]: <https://gitlab.com/essensium-mind/ble-memleak-repro.git> > "ble-memleak-repro repository" > > - [2]: <https://elixir.bootlin.com/linux/v6.5/source/net/bluetooth/l2cap_core.c#L7833> > "l2cap_conn_add()" > > - [3]: <https://elixir.bootlin.com/linux/v6.5/source/net/bluetooth/hci_conn.c#L986> > "hci_conn_add()" > > - [4]: <https://www.st.com/content/st_com/en/products/evaluation-tools/product-evaluation-tools/mcu-mpu-eval-tools/stm32-mcu-mpu-eval-tools/stm32-discovery-kits/stm32mp157c-dk2.html> > "STM32MP157C-DK2 (DK2)" > > - [5]: <https://elixir.bootlin.com/linux/v6.5/source/include/net/bluetooth/l2cap.h#L674> > "struct l2cap_conn" > > - [6]: <https://elixir.bootlin.com/linux/v6.5/source/include/net/bluetooth/hci_core.h#L685> > "struct hci_conn" > > - [7]: <https://buildroot.org/> > "Buildroot is a tool to generate embedded Linux systems" > > - [8]: <https://buildroot.org/downloads/manual/manual.html#outside-br-custom> > "9.2. Keeping customizations outside of Buildroot" > > - [11]: <patches/linux/0001-ARM-dts-stm32-Add-Bluetooth-usart2-feature-on-stm32m.patch> > "0001-ARM-dts-stm32-Add-Bluetooth-usart2-feature-on-stm32m.patch" > > - [12]: <patches/linux/0002-Bluetooth-add-many-traces-for-allocation-free-refcou.patch> > "0002-Bluetooth-add-many-traces-for-allocation-free-refcou.patch" > > - [13]: <patches/linux/0003-Bluetooth-L2CAP-use-refcount-on-struct-l2cap_chan-co.patch> > "0003-Bluetooth-L2CAP-use-refcount-on-struct-l2cap_chan-co.patch" > > - [14]: <patches/linux/0004-Bluetooth-L2CAP-free-the-leaking-struct-l2cap_conn.patch> > "0004-Bluetooth-L2CAP-free-the-leaking-struct-l2cap_conn.patch" > > - [15]: <patches/linux/0005-Bluetooth-introduce-hci_conn_free-for-better-structu.patch> > "0005-Bluetooth-introduce-hci_conn_free-for-better-structu.patch" > > - [16]: <patches/linux/0006-Bluetooth-L2CAP-inc-refcount-if-reuse-struct-l2cap_c.patch> > "0006-Bluetooth-L2CAP-inc-refcount-if-reuse-struct-l2cap_c.patch" > > - [17]: <patches/linux/0007-Bluetooth-unlink-objects-to-avoid-use-after-free.patch> > "0007-Bluetooth-unlink-objects-to-avoid-use-after-free.patch" > > - [18]: <https://gitlab.com/essensium-mind/ble-memleak-repro/-/blob/main/package/ble-memleak-repro/ble-memleak-repro.c?ref_type=heads> > "ble-memleak-repro.c" > > - [19]: <https://lore.kernel.org/linux-bluetooth/0000000000000fd01206046e7410@xxxxxxxxxx/T/#u> > "[syzbot] [bluetooth?] memory leak in hci_conn_add (2)" > 2023-09-02 23:25:00 -0700 > > --- > > Christophe Roullier (1): > ARM: dts: stm32: Add Bluetooth (usart2) feature on stm32mp157x > > Olivier L'Heureux (6): > Bluetooth: add many traces for allocation/free/refcounting > Bluetooth: L2CAP: use refcount on struct l2cap_chan->conn > Bluetooth: L2CAP: free the leaking struct l2cap_conn > Bluetooth: introduce hci_conn_free() for better structure > Bluetooth: L2CAP: inc refcount if reuse struct l2cap_conn > Bluetooth: unlink objects to avoid use-after-free > > arch/arm/boot/dts/st/stm32mp157c-dk2.dts | 11 ++++++- > include/net/bluetooth/hci_core.h | 7 +++-- > net/bluetooth/hci_conn.c | 18 ++++++++++++ > net/bluetooth/hci_sysfs.c | 8 ++++- > net/bluetooth/l2cap_core.c | 37 ++++++++++++++++++++---- > 5 files changed, 72 insertions(+), 9 deletions(-) > > > base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c > -- > 2.39.2 > -- Luiz Augusto von Dentz