Hi Marleen, On Tue, Nov 7, 2023 at 3:46 AM Marleen Vos <marleen.vos@xxxxxxxxxxxxx> wrote: > > Hi Luiz, > > Because Olivier has been and still is being swamped with other work, I'm > kind of trying to take over these patches. So far I can reproduce the > memleak on a recent kernel without the patches. > > Olivier told me you added tests to check for the memleak in > l2cap-tester. Can you point me towards more info on how exactly you run > these tests, as the info I find on the web is rather minimal. > > From what I read in the thread, it looks like your tests don't catch > the memleak? I'm currently on vacation, I should be back next week, try checking l2cap-tester to emulate the leak as you said it was still reproducible. > Kind regards, > > Marleen > > > On 14/09/2023 23:17, Luiz Augusto von Dentz wrote: > > Hi Olivier, > > > > On Wed, Sep 13, 2023 at 3:25 PM Olivier L'Heureux > > <olivier.lheureux@xxxxxxx> wrote: > >> Hello Luiz, > >> > >> Thanks for your review. > >> > >> On 05/09/2023 22:42, Luiz Augusto von Dentz wrote: > >>> Hi Olivier, > >>> > >>> On Mon, Sep 4, 2023 at 3:12 PM Olivier L'Heureux > >>> <olivier.lheureux@xxxxxxx> wrote: > >>>> Request for Comments > >> [...] > >>>> The "ble-memleak-repro" program reproduces the memory leak, if the > >>>> kernel is not patched. Its source is in > >>>> "package/ble-memleak-repro/ble-memleak-repro.c" [18]. > >>> We should probably create a test in l2cap-tester using SO_SNDTIMEO > >>> first, so we can make sure CI test such case and we are able to detect > >>> if the problem is reintroduced later: > >>> > >>> https://github.com/bluez/bluez/blob/master/tools/l2cap-tester.c > >> I didn't know about "l2cap-tester.c". Agree, it would be great to > >> integrate my test app into it. I could try, but I don't know the test > >> framework yet. > >> > >>>> Memory Leak Fix > >>>> =============== > >>>> > >>>> We have fixed the memory leak, see the patch series in > >>>> "patches/linux/": > >>>> > >>>> 1. The first patch > >>>> "0001-ARM-dts-stm32-Add-Bluetooth-usart2-feature-on-stm32m.patch" [11] > >>>> enables Bluetooth on the DK2. > >>> This above should probably be sent separately. > >>> > >>>> 2. The second patch > >>>> "0002-Bluetooth-add-many-traces-for-allocation-free-refcou.patch" [12] > >>>> adds many dynamic debug traces that help understand the kernel > >>>> behaviour and freeing problems. > >>> I'm fine with this change, but we better use the likes of bt_dev_dbg > >>> instead of BT_DBG. > >> This commit intended to increase the visibility during debugging, and > >> I was not intending it for a permanent presence in the kernel. > >> But if you find it useful, I can submit a patch RFC v2 with > >> "bt_dev_dbg()" instead of "BT_DBG()". Note that there is currently no > >> "bt_dev_dbg()" in "l2cap_core.c" yet. > >> > >>>> "0003-Bluetooth-L2CAP-use-refcount-on-struct-l2cap_chan-co.patch" [13] > >>>> and > >>>> "0004-Bluetooth-L2CAP-free-the-leaking-struct-l2cap_conn.patch" [14] > >>>> fix the memory leak. > >>>> 4. Patches > >>>> "0005-Bluetooth-introduce-hci_conn_free-for-better-structu.patch" [15], > >>>> "0006-Bluetooth-L2CAP-inc-refcount-if-reuse-struct-l2cap_c.patch" [16] > >>>> and > >>>> "0007-Bluetooth-unlink-objects-to-avoid-use-after-free.patch" [17] > >>>> fixes the use-after-free that appears when the "struct l2cap_conn" > >>>> [5] and "struct hci_conn" [6] objects are freed. > >>> These I'm not very comfortable applying as they are, I'm afraid there > >>> could be regressions if they are not accompanied with proper tests, > >>> besides I think there are less intrusive ways to cleanup l2cap_conn, > >>> see below. > >>> > >>>> The commit messages explain why the commit is needed, which problem > >>>> the commit solves, and how. > >>>> > >>>> The first and second patches are present for the memory leak > >>>> reproduction only, they should not be part of a final fix. > >>>> > >>>> Patch Status > >>>> ============ > >>>> > >>>> As of writing, the memory leak is fixed. The fix opened the door to > >>>> other problems, especially use-after-free, sometimes followed by > >>>> crashes due to NULL dereferences. We think there are weak references > >>>> (i.e. pointers that don't increment the refcounter) previously > >>>> pointing to memory that was never freed, but now is freed. On kernels > >>>> v5.13 and v5.15, patches 0005, 0006 and 0007 seem to fix the > >>>> use-after-free and NULL dereferences, but not on kernel v6.5 yet. > >>> I think the problem is that the lifetime of l2cap_conn shall be hooked > >>> with hci_chan, but the likes of hci_chan_list_flush -> hci_chan_del > >>> don't actually trigger l2cap_conn_del, which imo is the culprit here, > >>> because connect_cfm/disconnect_cfm is not called when the connection > >>> procedure has been aborted prematurely, so perhaps we shall get rid of > >>> the likes of l2cap_connect_cfm/l2cap_disconn_cfm and instead do the > >>> cleanup with in the following order: > >>> hci_conn_cleanup->hci_chan_list_flush->hci_chan_del->l2cap_conn_del, > >>> that way we avoid having multiple code paths attempting to cleanup > >>> objects associated with hci_conn/hci_chan. > >> I fully agree with your analysis, which correspond to mine: we should > >> call "l2cap_conn_del()", it would properly clean the allocations in > >> "l2cap_conn_add()". > >> I have tried but it was not obvious to find the right place to call > >> "l2cap_conn_del()" with the proper locking. > >> As you write, connect_cfm/disconnect_cfm is not called when the > >> connection is aborted, and that is the root cause of the memory leak. > > Btw, I was trying to reproduce it with the following test set, but at > > least kmemleak was not able to detect leaks of l2cap_conn: > > > > https://patchwork.kernel.org/project/bluetooth/list/?series=784343 > > > >> Your proposal is most probably the best way to go. > >> > >>> I'd change hci_chan_create to take a del callback to avoid having > >>> circular dependencies on one another though. > >> Interesting, could you explain how you would do it? Perhaps point on > >> a callback example? > >> > >>>> Reproducing with Buildroot > >>>> ========================== > >>>> > >>>> We have reproduced and fixed the memory leak with Buildroot [7] and a > >>>> "ble-memleak-repro" test application on an ST's Discovery Kit DK2 [4]. > >>>> > >>>> The "ble-memleak-repro" repository [1] contains the sources of a > >>>> complete external Buildroot customisation [8], with the patches, a > >>>> README, and more explanations to reproduce the problem with Buildroot > >>>> on a DK2. > >>>> > >>>> References: > >>>> =========== > >>>> > >>>> - [1]:<https://gitlab.com/essensium-mind/ble-memleak-repro.git> > >>>> "ble-memleak-repro repository" > >> [...] > >> > >> -- > >> Olivier L'Heureux -- Luiz Augusto von Dentz