Now that the "struct l2cap_conn" memory leak [1] is fixed, we observe use-after-free errors. Arnout Vandecappelle has found the root cause: the if (conn) return conn; at the beginning of "l2cap_conn_add()". It reuses an existing "struct l2cap_conn *" instead of allocating a new one. But there is an incoherence in the reference counting: 1. In the normal case (no reuse), "l2cap_conn_add()" allocates a new "struct l2cap_conn", initiates its "ref" reference counter to 1 and returns it. The caller is responsible to eventually call "l2cap_conn_put()" to free the returned "struct l2cap_conn". 2. If "l2cap_conn_add()" reuses an existing "struct l2cap_conn", it will return it immediately. But the caller, which can not know if "l2cap_conn_add()" reused an existing "struct l2cap_conn" or not, will eventually call "l2cap_conn_put()", for which there were no corresponding "l2cap_conn_get()". Tracing the reuse showed the reuse was indeed the reason for the use-after-free: [...] [ 960.331756] l2cap_conn_add:7719: hcon 1f5f8bdf reuse conn b69c7ec5 [ 960.331798] ------------[ cut here ]------------ [ 960.339480] WARNING: CPU: 0 PID: 173 at lib/refcount.c:25 l2cap_conn_get+0x8c/0x94 [ 960.353863] refcount_t: addition on 0; use-after-free. [ 960.362924] Modules linked in: [ 960.368036] CPU: 0 PID: 173 Comm: ble-memleak-rep Not tainted 5.13.0 #19 [ 960.380245] Hardware name: STM32 (Device Tree Support) [ 960.387449] [<c010e9a4>] (unwind_backtrace) from [<c010af48>] (show_stack+0x10/0x14) [ 960.400742] [<c010af48>] (show_stack) from [<c07f98dc>] (dump_stack+0xb4/0xc8) [ 960.413501] [<c07f98dc>] (dump_stack) from [<c07f6ee0>] (__warn+0xb8/0x114) [ 960.425994] [<c07f6ee0>] (__warn) from [<c07f6fb4>] (warn_slowpath_fmt+0x78/0xac) [ 960.439016] [<c07f6fb4>] (warn_slowpath_fmt) from [<c07d2690>] (l2cap_conn_get+0x8c/0x94) [ 960.452752] [<c07d2690>] (l2cap_conn_get) from [<c07d92b0>] (__l2cap_chan_add+0x3c/0x1e4) [ 960.466486] [<c07d92b0>] (__l2cap_chan_add) from [<c07da8d0>] (l2cap_chan_connect+0x514/0x9c8) [ 960.480656] [<c07da8d0>] (l2cap_chan_connect) from [<c07e12b8>] (l2cap_sock_connect+0x144/0x21c) [ 960.495023] [<c07e12b8>] (l2cap_sock_connect) from [<c065c5e0>] (__sys_connect+0xc8/0xe0) [ 960.508927] [<c065c5e0>] (__sys_connect) from [<c0100060>] (ret_fast_syscall+0x0/0x58) [...] The solution to the incoherence is to strictly apply the rules of reference counting [2]. By returning a reference to the reused "struct l2cap_conn", "l2cap_conn_add()" creates a new reference, and this reference should be reference-counted. The: if (conn) return conn; must thus become: if (conn) return l2cap_conn_get(conn); References: - [1] "ble-memleak-repro" <https://gitlab.com/essensium-mind/ble-memleak-repro.git> - [2] "Adding reference counters (krefs) to kernel objects" <https://www.kernel.org/doc/html/latest/core-api/kref.html> Signed-off-by: Olivier L'Heureux <olivier.lheureux@xxxxxxxxxxxxxxxx> Signed-off-by: Olivier L'Heureux <olivier.lheureux@xxxxxxx> Suggested-by: Arnout Vandecappelle <arnout.vandecappelle@xxxxxxx> --- net/bluetooth/l2cap_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index f5dcb4a4fb15..5e4dd293b2a4 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -7844,8 +7844,8 @@ static struct l2cap_conn *l2cap_conn_add(struct hci_conn *hcon) struct hci_chan *hchan; if (conn) { - BT_DBG("hcon %p reuse conn %p", hcon, conn); - return conn; + BT_DBG("hcon %p reuse conn %p with l2cap_conn_get()", hcon, conn); + return l2cap_conn_get(conn); } hchan = hci_chan_create(hcon); -- 2.39.2