Hi Lee, On Wed, Jul 6, 2022 at 3:53 AM Lee Jones <lee.jones@xxxxxxxxxx> wrote: > > On Tue, 05 Jul 2022, Luiz Augusto von Dentz wrote: > > > Hi Lee, > > > > On Wed, Jun 29, 2022 at 8:28 AM Lee Jones <lee.jones@xxxxxxxxxx> wrote: > > > > > > On Tue, 28 Jun 2022, Luiz Augusto von Dentz wrote: > > > > > > > Hi Eric, Lee, > > > > > > > > On Mon, Jun 27, 2022 at 4:39 PM Luiz Augusto von Dentz > > > > <luiz.dentz@xxxxxxxxx> wrote: > > > > > > > > > > Hi Eric, Lee, > > > > > > > > > > On Mon, Jun 27, 2022 at 7:41 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, Jun 22, 2022 at 10:27 AM Lee Jones <lee.jones@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > This change prevents a use-after-free caused by one of the worker > > > > > > > threads starting up (see below) *after* the final channel reference > > > > > > > has been put() during sock_close() but *before* the references to the > > > > > > > channel have been destroyed. > > > > > > > > > > > > > > refcount_t: increment on 0; use-after-free. > > > > > > > BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 > > > > > > > Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 > > > > > > > > > > > > > > CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 > > > > > > > Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) > > > > > > > Workqueue: hci0 hci_rx_work > > > > > > > Call trace: > > > > > > > dump_backtrace+0x0/0x378 > > > > > > > show_stack+0x20/0x2c > > > > > > > dump_stack+0x124/0x148 > > > > > > > print_address_description+0x80/0x2e8 > > > > > > > __kasan_report+0x168/0x188 > > > > > > > kasan_report+0x10/0x18 > > > > > > > __asan_load4+0x84/0x8c > > > > > > > refcount_dec_and_test+0x20/0xd0 > > > > > > > l2cap_chan_put+0x48/0x12c > > > > > > > l2cap_recv_frame+0x4770/0x6550 > > > > > > > l2cap_recv_acldata+0x44c/0x7a4 > > > > > > > hci_acldata_packet+0x100/0x188 > > > > > > > hci_rx_work+0x178/0x23c > > > > > > > process_one_work+0x35c/0x95c > > > > > > > worker_thread+0x4cc/0x960 > > > > > > > kthread+0x1a8/0x1c4 > > > > > > > ret_from_fork+0x10/0x18 > > > > > > > > > > > > > > Cc: stable@xxxxxxxxxx > > > > > > > > > > > > When was the bug added ? (Fixes: tag please) > > > > > > > > > > > > > Cc: Marcel Holtmann <marcel@xxxxxxxxxxxx> > > > > > > > Cc: Johan Hedberg <johan.hedberg@xxxxxxxxx> > > > > > > > Cc: Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> > > > > > > > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> > > > > > > > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > > > > > > > Cc: Jakub Kicinski <kuba@xxxxxxxxxx> > > > > > > > Cc: Paolo Abeni <pabeni@xxxxxxxxxx> > > > > > > > Cc: linux-bluetooth@xxxxxxxxxxxxxxx > > > > > > > Cc: netdev@xxxxxxxxxxxxxxx > > > > > > > Signed-off-by: Lee Jones <lee.jones@xxxxxxxxxx> > > > > > > > --- > > > > > > > net/bluetooth/l2cap_core.c | 4 ++-- > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c > > > > > > > index ae78490ecd3d4..82279c5919fd8 100644 > > > > > > > --- a/net/bluetooth/l2cap_core.c > > > > > > > +++ b/net/bluetooth/l2cap_core.c > > > > > > > @@ -483,9 +483,7 @@ static void l2cap_chan_destroy(struct kref *kref) > > > > > > > > > > > > > > BT_DBG("chan %p", chan); > > > > > > > > > > > > > > - write_lock(&chan_list_lock); > > > > > > > list_del(&chan->global_l); > > > > > > > - write_unlock(&chan_list_lock); > > > > > > > > > > > > > > kfree(chan); > > > > > > > } > > > > > > > @@ -501,7 +499,9 @@ void l2cap_chan_put(struct l2cap_chan *c) > > > > > > > { > > > > > > > BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref)); > > > > > > > > > > > > > > + write_lock(&chan_list_lock); > > > > > > > kref_put(&c->kref, l2cap_chan_destroy); > > > > > > > + write_unlock(&chan_list_lock); > > > > > > > } > > > > > > > EXPORT_SYMBOL_GPL(l2cap_chan_put); > > > > > > > > > > > > > > > > > > > > > > > > > > I do not think this patch is correct. > > > > > > > > > > > > a kref does not need to be protected by a write lock. > > > > > > > > > > > > This might shuffle things enough to work around a particular repro you have. > > > > > > > > > > > > If the patch was correct why not protect kref_get() sides ? > > > > > > > > > > > > Before the &hdev->rx_work is scheduled (queue_work(hdev->workqueue, > > > > > > &hdev->rx_work), > > > > > > a reference must be taken. > > > > > > > > > > > > Then this reference must be released at the end of hci_rx_work() or > > > > > > when hdev->workqueue > > > > > > is canceled. > > > > > > > > > > > > This refcount is not needed _if_ the workqueue is properly canceled at > > > > > > device dismantle, > > > > > > in a synchronous way. > > > > > > > > > > > > I do not see this hdev->rx_work being canceled, maybe this is the real issue. > > > > > > > > > > > > There is a call to drain_workqueue() but this is not enough I think, > > > > > > because hci_recv_frame() > > > > > > can re-arm > > > > > > queue_work(hdev->workqueue, &hdev->rx_work); > > > > > > > > > > I suspect this likely a refcount problem, we do l2cap_get_chan_by_scid: > > > > > > > > > > /* Find channel with given SCID. > > > > > * Returns locked channel. */ > > > > > static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn > > > > > *conn, u16 cid) > > > > > > > > > > So we return a locked channel but that doesn't prevent another thread > > > > > to call l2cap_chan_put which doesn't care about l2cap_chan_lock so > > > > > perhaps we actually need to host a reference while we have the lock, > > > > > at least we do something like that on l2cap_sock.c: > > > > > > > > > > l2cap_chan_hold(chan); > > > > > l2cap_chan_lock(chan); > > > > > > > > > > __clear_chan_timer(chan); > > > > > l2cap_chan_close(chan, ECONNRESET); > > > > > l2cap_sock_kill(sk); > > > > > > > > > > l2cap_chan_unlock(chan); > > > > > l2cap_chan_put(chan); > > > > > > > > Perhaps something like this: > > > > > > I'm struggling to apply this for test: > > > > > > "error: corrupt patch at line 6" > > > > Check with the attached patch. > > With the patch applied: > > [ 188.825418][ T75] refcount_t: addition on 0; use-after-free. > [ 188.825418][ T75] refcount_t: addition on 0; use-after-free. Looks like the changes just make the issue more visible since we are trying to add a refcount when it is already 0 so this proves the design is not quite right since it is removing the object from the list only when destroying it while we probably need to do it before. How about we use kref_get_unless_zero as it appears it was introduced exactly for such cases (patch attached.) Luiz Augusto von Dentz
From 2805374e1c05c6bba92efb0b472949da1e00dcfb Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz <luiz.von.dentz@xxxxxxxxx> Date: Tue, 28 Jun 2022 15:46:04 -0700 Subject: [PATCH] Bluetooth: L2CAP: WIP Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@xxxxxxxxx> --- include/net/bluetooth/l2cap.h | 1 + net/bluetooth/l2cap_core.c | 52 ++++++++++++++++++++++++++++------- 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h index 3c4f550e5a8b..2f766e3437ce 100644 --- a/include/net/bluetooth/l2cap.h +++ b/include/net/bluetooth/l2cap.h @@ -847,6 +847,7 @@ enum { }; void l2cap_chan_hold(struct l2cap_chan *c); +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c); void l2cap_chan_put(struct l2cap_chan *c); static inline void l2cap_chan_lock(struct l2cap_chan *chan) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 09ecaf556de5..f9ba584217fd 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -111,7 +111,8 @@ static struct l2cap_chan *__l2cap_get_chan_by_scid(struct l2cap_conn *conn, } /* Find channel with given SCID. - * Returns locked channel. */ + * Returns a reference locked channel. + */ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn, u16 cid) { @@ -119,15 +120,19 @@ static struct l2cap_chan *l2cap_get_chan_by_scid(struct l2cap_conn *conn, mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_scid(conn, cid); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock); return c; } /* Find channel with given DCID. - * Returns locked channel. + * Returns a reference locked channel. */ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn, u16 cid) @@ -136,8 +141,12 @@ static struct l2cap_chan *l2cap_get_chan_by_dcid(struct l2cap_conn *conn, mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_dcid(conn, cid); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock); return c; @@ -162,8 +171,12 @@ static struct l2cap_chan *l2cap_get_chan_by_ident(struct l2cap_conn *conn, mutex_lock(&conn->chan_lock); c = __l2cap_get_chan_by_ident(conn, ident); - if (c) - l2cap_chan_lock(c); + if (c) { + /* Only lock if chan reference is not 0 */ + c = l2cap_chan_hold_unless_zero(c); + if (c) + l2cap_chan_lock(c); + } mutex_unlock(&conn->chan_lock); return c; @@ -497,6 +510,16 @@ void l2cap_chan_hold(struct l2cap_chan *c) kref_get(&c->kref); } +struct l2cap_chan *l2cap_chan_hold_unless_zero(struct l2cap_chan *c) +{ + BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref)); + + if (!kref_get_unless_zero(&c->kref)) + return NULL; + + return c; +} + void l2cap_chan_put(struct l2cap_chan *c) { BT_DBG("chan %p orig refcnt %u", c, kref_read(&c->kref)); @@ -4464,6 +4487,7 @@ static inline int l2cap_config_req(struct l2cap_conn *conn, unlock: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return err; } @@ -4578,6 +4602,7 @@ static inline int l2cap_config_rsp(struct l2cap_conn *conn, done: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return err; } @@ -5305,6 +5330,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn, l2cap_send_move_chan_rsp(chan, result); l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -5397,6 +5423,7 @@ static void l2cap_move_continue(struct l2cap_conn *conn, u16 icid, u16 result) } l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid, @@ -5426,6 +5453,7 @@ static void l2cap_move_fail(struct l2cap_conn *conn, u8 ident, u16 icid, l2cap_send_move_chan_cfm(chan, L2CAP_MC_UNCONFIRMED); l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } static int l2cap_move_channel_rsp(struct l2cap_conn *conn, @@ -5489,6 +5517,7 @@ static int l2cap_move_channel_confirm(struct l2cap_conn *conn, l2cap_send_move_chan_cfm_rsp(conn, cmd->ident, icid); l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -5524,6 +5553,7 @@ static inline int l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn, } l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -5896,12 +5926,11 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn, if (credits > max_credits) { BT_ERR("LE credits overflow"); l2cap_send_disconn_req(chan, ECONNRESET); - l2cap_chan_unlock(chan); /* Return 0 so that we don't trigger an unnecessary * command reject packet. */ - return 0; + goto unlock; } chan->tx_credits += credits; @@ -5912,7 +5941,9 @@ static inline int l2cap_le_credits(struct l2cap_conn *conn, if (chan->tx_credits) chan->ops->resume(chan); +unlock: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); return 0; } @@ -7598,6 +7629,7 @@ static void l2cap_data_channel(struct l2cap_conn *conn, u16 cid, done: l2cap_chan_unlock(chan); + l2cap_chan_put(chan); } static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm, -- 2.35.3