Re: [PATCH v2] Bluetooth: Fix UAF for hci_chan

Pauli Virtanen <pav@xxxxxx> · Wed, 11 Oct 2023 19:33:55 +0000

Hi Luiz,

ti, 2023-10-10 kello 11:51 -0700, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
> 
> On Tue, Oct 10, 2023 at 11:32 AM Pauli Virtanen <pav@xxxxxx> wrote:
> > 
> > Hi,
> > 
> > ti, 2023-10-10 kello 20:49 +0800, Ziyang Xuan (William)
> > kirjoitti:[clip]
> > > > > 
> > > > > - if (atomic_dec_and_test(&conn->refcnt)) {
> > > > > + if (atomic_dec_and_test(&conn->refcnt) &&
> > > > > +     !test_bit(HCI_CONN_DISC, &conn->flags)) {
> > > > >           unsigned long timeo;
> > > > 
> > > > hci_abort_conn already checks if conn->abort_reason was already set, to
> > > > avoid queuing abort for the same conn again. Does this flag not try to
> > > > do the same thing?
> > > 
> > > That is not enough. hci_conn occur UAF in hci_proto_disconn_ind() before enter
> > > hci_abort_conn().
> > 
> > Thanks, this was not clear to me from the patch.
> > 
> > So the problem is that the cancel_delayed_work_sync(&conn->disc_work)
> > in hci_conn_del doesn't help in a few cases:
> > 
> > 1. [task A] hci_conn_del(conn) entered
> > 2. [A] cancel_delayed_work_sync(conn->disc_work)
> > 3. [B] concurrent hci_conn_drop(conn) queues disc_work again
> > 4. [A] hci_conn_del finishes
> > 5. UAF when disc_work runs
> > 
> > or without needing concurrency
> > 
> > 1. hci_conn_del(conn) entered and finishes
> > 2. hci_conn_drop(conn); hci_conn_put(conn); as done in several places
> > 
> > ?
> 
> Either way Im not sure what the IDA logic has to with it, that said I
> think using ida function is actually a much better solution then the
> lookup one so I would be happy to apply it if someone split the
> changes related to it and send a patch.
> 
> > The hci_conn_del here is not necessarily from hci_abort_conn. Should
> > the atomic flag be set in hci_conn_del before
> > cancel_delayed_work_sync(&conn->disc_work) to catch possible other
> > cases?
> > 
> > > > There are potential races in hci_sync vs. handle reuse since the queued
> > > > abort is not canceled if the conn is deleted by something else. But now
> > > > I'm not sure syzbot hits this race.
> > > 
> > > Sorry, can you give a specific scenario. I can't understand exactly what you mean.
> > 
> > As noted in the other patch:
> > 
> > 1. hci_conn_abort(conn)
> > 
> > 2. hci_cmd_sync_queue(hdev,abort_conn_sync,UINT_PTR(conn->handle))
> > 
> > 3. something else (e.g. HCI event) deletes conn
> > 
> > 4. something else (e.g. HCI event) creates conn2, with same handle
> >    value
> > 
> > 5. Queued abort_conn_sync executes. It needs to be delayed enough,
> >    but doesn't need to be concurrent.
> > 
> > 6. abort_conn_sync uses queued handle value, finds conn2 (not conn)
> > 
> > 7. hci_abort_conn_sync(conn2, conn2->abort_reason)
> > 
> > 8. Calling hci_abort_conn_sync with reason == 0 causes UAF
> > 
> > The UAF is because reason==0 is passed to l2cap_disconn_ind which does
> > not signal disconnection to the L2CAP layer, hci_abort_conn_sync
> > deletes conn immediately after that, and later on L2CAP tries to access
> > stale conn pointer.
> 
> Are you looking into implementing the cancel logic for abort? Long ago
> Ive send a patch to introduce the queue_one logic which does include a
> function to lookup into the cmd_sync queue, perhaps we can reuse that
> to implement the cancel logic.

I have a patchset for this, but needs still some more testing.

> > > > > 
> > > > >           switch (conn->type) {
> > > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > > > > index 974631e652c1..f87281b4386f 100644
> > > > > --- a/net/bluetooth/hci_conn.c
> > > > > +++ b/net/bluetooth/hci_conn.c
> > > > > @@ -153,6 +153,9 @@ static void hci_conn_cleanup(struct hci_conn *conn)
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > >   hci_conn_hash_del(hdev, conn);
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > + if (HCI_CONN_HANDLE_UNSET(conn->handle))
> > > > > +         ida_free(&hdev->unset_handle_ida, conn->handle);
> > > > > +
> > > > >   if (conn->cleanup)
> > > > >           conn->cleanup(conn);
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > @@ -929,29 +932,17 @@ static void cis_cleanup(struct hci_conn *conn)
> > > > >   hci_le_remove_cig(hdev, conn->iso_qos.ucast.cig);
> > > > >  }
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -static u16 hci_conn_hash_alloc_unset(struct hci_dev *hdev)
> > > > > +static int hci_conn_hash_alloc_unset(struct hci_dev *hdev)
> > > > >  {
> > > > > - struct hci_conn_hash *h = &hdev->conn_hash;
> > > > > - struct hci_conn  *c;
> > > > > - u16 handle = HCI_CONN_HANDLE_MAX + 1;
> > > > > -
> > > > > - rcu_read_lock();
> > > > > -
> > > > > - list_for_each_entry_rcu(c, &h->list, list) {
> > > > > -         /* Find the first unused handle */
> > > > > -         if (handle == 0xffff || c->handle != handle)
> > > > > -                 break;
> > > > > -         handle++;
> > > > > - }
> > > > > - rcu_read_unlock();
> > > > 
> > > > The original hci_conn_hash_alloc_unset seems to have wrong logic. It'll
> > > > e.g. always return HCI_CONN_HANDLE_MAX+1 except if the first conn in
> > > > conn_hash (which is not in particular order) has that handle...
> > > > 
> > > > The code paths adding/deleting conns or entering here / setting handles
> > > > should be holding hdev->lock, so probably no concurrency issue in it.
> > > > 
> > > > Is syzbot happy with just this handle allocation fixed?
> > > 
> > > Just fix handle, hci_conn occur UAF in hci_proto_disconn_ind() within hci_conn_timeout() process.
> > > 
> > > > 
> > > > > -
> > > > > - return handle;
> > > > > + return ida_alloc_range(&hdev->unset_handle_ida, HCI_CONN_HANDLE_MAX + 1,
> > > > > +                        U16_MAX, GFP_ATOMIC);
> > > > >  }
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > 
> > --
> > Pauli Virtanen
> 
> 
> 

-- 
Pauli Virtanen