Re: [PATCH RFC 0/7] Fix a memory leak in Bluetooth L2CAP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marleen,

On Tue, Nov 7, 2023 at 3:46 AM Marleen Vos <marleen.vos@xxxxxxxxxxxxx> wrote:
>
> Hi Luiz,
>
> Because Olivier has been and still is being swamped with other work, I'm
> kind of trying to take over these patches. So far I can reproduce the
> memleak on a recent kernel without the patches.
>
> Olivier told me you added tests to check for the memleak in
> l2cap-tester. Can you point me towards more info on how exactly you run
> these tests, as the info I find on the web is rather minimal.
>
>  From what I read in the thread, it looks like your tests don't catch
> the memleak?

I'm currently on vacation, I should be back next week, try checking
l2cap-tester to emulate the leak as you said it was still
reproducible.

> Kind regards,
>
> Marleen
>
>
> On 14/09/2023 23:17, Luiz Augusto von Dentz wrote:
> > Hi Olivier,
> >
> > On Wed, Sep 13, 2023 at 3:25 PM Olivier L'Heureux
> > <olivier.lheureux@xxxxxxx>  wrote:
> >> Hello Luiz,
> >>
> >> Thanks for your review.
> >>
> >> On 05/09/2023 22:42, Luiz Augusto von Dentz wrote:
> >>> Hi Olivier,
> >>>
> >>> On Mon, Sep 4, 2023 at 3:12 PM Olivier L'Heureux
> >>> <olivier.lheureux@xxxxxxx>  wrote:
> >>>> Request for Comments
> >> [...]
> >>>> The "ble-memleak-repro" program reproduces the memory leak, if the
> >>>> kernel is not patched. Its source is in
> >>>> "package/ble-memleak-repro/ble-memleak-repro.c" [18].
> >>> We should probably create a test in l2cap-tester using SO_SNDTIMEO
> >>> first, so we can make sure CI test such case and we are able to detect
> >>> if the problem is reintroduced later:
> >>>
> >>> https://github.com/bluez/bluez/blob/master/tools/l2cap-tester.c
> >> I didn't know about "l2cap-tester.c". Agree, it would be great to
> >> integrate my test app into it. I could try, but I don't know the test
> >> framework yet.
> >>
> >>>> Memory Leak Fix
> >>>> ===============
> >>>>
> >>>> We have fixed the memory leak, see the patch series in
> >>>> "patches/linux/":
> >>>>
> >>>>    1. The first patch
> >>>>       "0001-ARM-dts-stm32-Add-Bluetooth-usart2-feature-on-stm32m.patch" [11]
> >>>>       enables Bluetooth on the DK2.
> >>> This above should probably be sent separately.
> >>>
> >>>>    2. The second patch
> >>>>       "0002-Bluetooth-add-many-traces-for-allocation-free-refcou.patch" [12]
> >>>>       adds many dynamic debug traces that help understand the kernel
> >>>>       behaviour and freeing problems.
> >>> I'm fine with this change, but we better use the likes of bt_dev_dbg
> >>> instead of BT_DBG.
> >> This commit intended to increase the visibility during debugging, and
> >> I was not intending it for a permanent presence in the kernel.
> >> But if you find it useful, I can submit a patch RFC v2 with
> >> "bt_dev_dbg()" instead of "BT_DBG()". Note that there is currently no
> >> "bt_dev_dbg()" in "l2cap_core.c" yet.
> >>
> >>>>       "0003-Bluetooth-L2CAP-use-refcount-on-struct-l2cap_chan-co.patch" [13]
> >>>>       and
> >>>>       "0004-Bluetooth-L2CAP-free-the-leaking-struct-l2cap_conn.patch" [14]
> >>>>       fix the memory leak.
> >>>>    4. Patches
> >>>>       "0005-Bluetooth-introduce-hci_conn_free-for-better-structu.patch" [15],
> >>>>       "0006-Bluetooth-L2CAP-inc-refcount-if-reuse-struct-l2cap_c.patch" [16]
> >>>>       and
> >>>>       "0007-Bluetooth-unlink-objects-to-avoid-use-after-free.patch" [17]
> >>>>       fixes the use-after-free that appears when the "struct l2cap_conn"
> >>>>       [5] and "struct hci_conn" [6] objects are freed.
> >>> These I'm not very comfortable applying as they are, I'm afraid there
> >>> could be regressions if they are not accompanied with proper tests,
> >>> besides I think there are less intrusive ways to cleanup l2cap_conn,
> >>> see below.
> >>>
> >>>> The commit messages explain why the commit is needed, which problem
> >>>> the commit solves, and how.
> >>>>
> >>>> The first and second patches are present for the memory leak
> >>>> reproduction only, they should not be part of a final fix.
> >>>>
> >>>> Patch Status
> >>>> ============
> >>>>
> >>>> As of writing, the memory leak is fixed. The fix opened the door to
> >>>> other problems, especially use-after-free, sometimes followed by
> >>>> crashes due to NULL dereferences. We think there are weak references
> >>>> (i.e. pointers that don't increment the refcounter) previously
> >>>> pointing to memory that was never freed, but now is freed. On kernels
> >>>> v5.13 and v5.15, patches 0005, 0006 and 0007 seem to fix the
> >>>> use-after-free and NULL dereferences, but not on kernel v6.5 yet.
> >>> I think the problem is that the lifetime of l2cap_conn shall be hooked
> >>> with hci_chan, but the likes of hci_chan_list_flush -> hci_chan_del
> >>> don't actually trigger l2cap_conn_del, which imo is the culprit here,
> >>> because connect_cfm/disconnect_cfm is not called when the connection
> >>> procedure has been aborted prematurely, so perhaps we shall get rid of
> >>> the likes of l2cap_connect_cfm/l2cap_disconn_cfm and instead do the
> >>> cleanup with in the following order:
> >>> hci_conn_cleanup->hci_chan_list_flush->hci_chan_del->l2cap_conn_del,
> >>> that way we avoid having multiple code paths attempting to cleanup
> >>> objects associated with hci_conn/hci_chan.
> >> I fully agree with your analysis, which correspond to mine: we should
> >> call "l2cap_conn_del()", it would properly clean the allocations in
> >> "l2cap_conn_add()".
> >> I have tried but it was not obvious to find the right place to call
> >> "l2cap_conn_del()" with the proper locking.
> >> As you write, connect_cfm/disconnect_cfm is not called when the
> >> connection is aborted, and that is the root cause of the memory leak.
> > Btw, I was trying to reproduce it with the following test set, but at
> > least kmemleak was not able to detect leaks of l2cap_conn:
> >
> > https://patchwork.kernel.org/project/bluetooth/list/?series=784343
> >
> >> Your proposal is most probably the best way to go.
> >>
> >>> I'd change hci_chan_create to take a del callback to avoid having
> >>> circular dependencies on one another though.
> >> Interesting, could you explain how you would do it? Perhaps point on
> >> a callback example?
> >>
> >>>> Reproducing with Buildroot
> >>>> ==========================
> >>>>
> >>>> We have reproduced and fixed the memory leak with Buildroot [7] and a
> >>>> "ble-memleak-repro" test application on an ST's Discovery Kit DK2 [4].
> >>>>
> >>>> The "ble-memleak-repro" repository [1] contains the sources of a
> >>>> complete external Buildroot customisation [8], with the patches, a
> >>>> README, and more explanations to reproduce the problem with Buildroot
> >>>> on a DK2.
> >>>>
> >>>> References:
> >>>> ===========
> >>>>
> >>>> - [1]:<https://gitlab.com/essensium-mind/ble-memleak-repro.git>
> >>>>          "ble-memleak-repro repository"
> >> [...]
> >>
> >> --
> >> Olivier L'Heureux



-- 
Luiz Augusto von Dentz





[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux