Re: [PATCH v2] Bluetooth: Fix UAF for hci_chan

Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> · Tue, 10 Oct 2023 11:51:41 -0700

Hi Pauli,

On Tue, Oct 10, 2023 at 11:32 AM Pauli Virtanen <pav@xxxxxx> wrote:
>
> Hi,
>
> ti, 2023-10-10 kello 20:49 +0800, Ziyang Xuan (William)
> kirjoitti:[clip]
> > > >
> > > > - if (atomic_dec_and_test(&conn->refcnt)) {
> > > > + if (atomic_dec_and_test(&conn->refcnt) &&
> > > > +     !test_bit(HCI_CONN_DISC, &conn->flags)) {
> > > >           unsigned long timeo;
> > >
> > > hci_abort_conn already checks if conn->abort_reason was already set, to
> > > avoid queuing abort for the same conn again. Does this flag not try to
> > > do the same thing?
> >
> > That is not enough. hci_conn occur UAF in hci_proto_disconn_ind() before enter
> > hci_abort_conn().
>
> Thanks, this was not clear to me from the patch.
>
> So the problem is that the cancel_delayed_work_sync(&conn->disc_work)
> in hci_conn_del doesn't help in a few cases:
>
> 1. [task A] hci_conn_del(conn) entered
> 2. [A] cancel_delayed_work_sync(conn->disc_work)
> 3. [B] concurrent hci_conn_drop(conn) queues disc_work again
> 4. [A] hci_conn_del finishes
> 5. UAF when disc_work runs
>
> or without needing concurrency
>
> 1. hci_conn_del(conn) entered and finishes
> 2. hci_conn_drop(conn); hci_conn_put(conn); as done in several places
>
> ?

Either way Im not sure what the IDA logic has to with it, that said I
think using ida function is actually a much better solution then the
lookup one so I would be happy to apply it if someone split the
changes related to it and send a patch.

> The hci_conn_del here is not necessarily from hci_abort_conn. Should
> the atomic flag be set in hci_conn_del before
> cancel_delayed_work_sync(&conn->disc_work) to catch possible other
> cases?
>
> > > There are potential races in hci_sync vs. handle reuse since the queued
> > > abort is not canceled if the conn is deleted by something else. But now
> > > I'm not sure syzbot hits this race.
> >
> > Sorry, can you give a specific scenario. I can't understand exactly what you mean.
>
> As noted in the other patch:
>
> 1. hci_conn_abort(conn)
>
> 2. hci_cmd_sync_queue(hdev,abort_conn_sync,UINT_PTR(conn->handle))
>
> 3. something else (e.g. HCI event) deletes conn
>
> 4. something else (e.g. HCI event) creates conn2, with same handle
>    value
>
> 5. Queued abort_conn_sync executes. It needs to be delayed enough,
>    but doesn't need to be concurrent.
>
> 6. abort_conn_sync uses queued handle value, finds conn2 (not conn)
>
> 7. hci_abort_conn_sync(conn2, conn2->abort_reason)
>
> 8. Calling hci_abort_conn_sync with reason == 0 causes UAF
>
> The UAF is because reason==0 is passed to l2cap_disconn_ind which does
> not signal disconnection to the L2CAP layer, hci_abort_conn_sync
> deletes conn immediately after that, and later on L2CAP tries to access
> stale conn pointer.

Are you looking into implementing the cancel logic for abort? Long ago
Ive send a patch to introduce the queue_one logic which does include a
function to lookup into the cmd_sync queue, perhaps we can reuse that
to implement the cancel logic.

> > > >
> > > >           switch (conn->type) {
> > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > > index 974631e652c1..f87281b4386f 100644
> > > > --- a/net/bluetooth/hci_conn.c
> > > > +++ b/net/bluetooth/hci_conn.c
> > > > @@ -153,6 +153,9 @@ static void hci_conn_cleanup(struct hci_conn *conn)
> > > >
> > > >
> > > >
> > > >
> > > >   hci_conn_hash_del(hdev, conn);
> > > >
> > > >
> > > >
> > > >
> > > > + if (HCI_CONN_HANDLE_UNSET(conn->handle))
> > > > +         ida_free(&hdev->unset_handle_ida, conn->handle);
> > > > +
> > > >   if (conn->cleanup)
> > > >           conn->cleanup(conn);
> > > >
> > > >
> > > >
> > > >
> > > > @@ -929,29 +932,17 @@ static void cis_cleanup(struct hci_conn *conn)
> > > >   hci_le_remove_cig(hdev, conn->iso_qos.ucast.cig);
> > > >  }
> > > >
> > > >
> > > >
> > > >
> > > > -static u16 hci_conn_hash_alloc_unset(struct hci_dev *hdev)
> > > > +static int hci_conn_hash_alloc_unset(struct hci_dev *hdev)
> > > >  {
> > > > - struct hci_conn_hash *h = &hdev->conn_hash;
> > > > - struct hci_conn  *c;
> > > > - u16 handle = HCI_CONN_HANDLE_MAX + 1;
> > > > -
> > > > - rcu_read_lock();
> > > > -
> > > > - list_for_each_entry_rcu(c, &h->list, list) {
> > > > -         /* Find the first unused handle */
> > > > -         if (handle == 0xffff || c->handle != handle)
> > > > -                 break;
> > > > -         handle++;
> > > > - }
> > > > - rcu_read_unlock();
> > >
> > > The original hci_conn_hash_alloc_unset seems to have wrong logic. It'll
> > > e.g. always return HCI_CONN_HANDLE_MAX+1 except if the first conn in
> > > conn_hash (which is not in particular order) has that handle...
> > >
> > > The code paths adding/deleting conns or entering here / setting handles
> > > should be holding hdev->lock, so probably no concurrency issue in it.
> > >
> > > Is syzbot happy with just this handle allocation fixed?
> >
> > Just fix handle, hci_conn occur UAF in hci_proto_disconn_ind() within hci_conn_timeout() process.
> >
> > >
> > > > -
> > > > - return handle;
> > > > + return ida_alloc_range(&hdev->unset_handle_ida, HCI_CONN_HANDLE_MAX + 1,
> > > > +                        U16_MAX, GFP_ATOMIC);
> > > >  }
> > > >
> > > >
> > > >
> > > >
> > >
>
> --
> Pauli Virtanen

-- 
Luiz Augusto von Dentz