Hi Luiz, ti, 2023-10-10 kello 11:51 -0700, Luiz Augusto von Dentz kirjoitti: > Hi Pauli, > > On Tue, Oct 10, 2023 at 11:32 AM Pauli Virtanen <pav@xxxxxx> wrote: > > > > Hi, > > > > ti, 2023-10-10 kello 20:49 +0800, Ziyang Xuan (William) > > kirjoitti:[clip] > > > > > > > > > > - if (atomic_dec_and_test(&conn->refcnt)) { > > > > > + if (atomic_dec_and_test(&conn->refcnt) && > > > > > + !test_bit(HCI_CONN_DISC, &conn->flags)) { > > > > > unsigned long timeo; > > > > > > > > hci_abort_conn already checks if conn->abort_reason was already set, to > > > > avoid queuing abort for the same conn again. Does this flag not try to > > > > do the same thing? > > > > > > That is not enough. hci_conn occur UAF in hci_proto_disconn_ind() before enter > > > hci_abort_conn(). > > > > Thanks, this was not clear to me from the patch. > > > > So the problem is that the cancel_delayed_work_sync(&conn->disc_work) > > in hci_conn_del doesn't help in a few cases: > > > > 1. [task A] hci_conn_del(conn) entered > > 2. [A] cancel_delayed_work_sync(conn->disc_work) > > 3. [B] concurrent hci_conn_drop(conn) queues disc_work again > > 4. [A] hci_conn_del finishes > > 5. UAF when disc_work runs > > > > or without needing concurrency > > > > 1. hci_conn_del(conn) entered and finishes > > 2. hci_conn_drop(conn); hci_conn_put(conn); as done in several places > > > > ? > > Either way Im not sure what the IDA logic has to with it, that said I > think using ida function is actually a much better solution then the > lookup one so I would be happy to apply it if someone split the > changes related to it and send a patch. > > > The hci_conn_del here is not necessarily from hci_abort_conn. Should > > the atomic flag be set in hci_conn_del before > > cancel_delayed_work_sync(&conn->disc_work) to catch possible other > > cases? > > > > > > There are potential races in hci_sync vs. handle reuse since the queued > > > > abort is not canceled if the conn is deleted by something else. But now > > > > I'm not sure syzbot hits this race. > > > > > > Sorry, can you give a specific scenario. I can't understand exactly what you mean. > > > > As noted in the other patch: > > > > 1. hci_conn_abort(conn) > > > > 2. hci_cmd_sync_queue(hdev,abort_conn_sync,UINT_PTR(conn->handle)) > > > > 3. something else (e.g. HCI event) deletes conn > > > > 4. something else (e.g. HCI event) creates conn2, with same handle > > value > > > > 5. Queued abort_conn_sync executes. It needs to be delayed enough, > > but doesn't need to be concurrent. > > > > 6. abort_conn_sync uses queued handle value, finds conn2 (not conn) > > > > 7. hci_abort_conn_sync(conn2, conn2->abort_reason) > > > > 8. Calling hci_abort_conn_sync with reason == 0 causes UAF > > > > The UAF is because reason==0 is passed to l2cap_disconn_ind which does > > not signal disconnection to the L2CAP layer, hci_abort_conn_sync > > deletes conn immediately after that, and later on L2CAP tries to access > > stale conn pointer. > > Are you looking into implementing the cancel logic for abort? Long ago > Ive send a patch to introduce the queue_one logic which does include a > function to lookup into the cmd_sync queue, perhaps we can reuse that > to implement the cancel logic. I have a patchset for this, but needs still some more testing. > > > > > > > > > > switch (conn->type) { > > > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c > > > > > index 974631e652c1..f87281b4386f 100644 > > > > > --- a/net/bluetooth/hci_conn.c > > > > > +++ b/net/bluetooth/hci_conn.c > > > > > @@ -153,6 +153,9 @@ static void hci_conn_cleanup(struct hci_conn *conn) > > > > > > > > > > > > > > > > > > > > > > > > > hci_conn_hash_del(hdev, conn); > > > > > > > > > > > > > > > > > > > > > > > > > + if (HCI_CONN_HANDLE_UNSET(conn->handle)) > > > > > + ida_free(&hdev->unset_handle_ida, conn->handle); > > > > > + > > > > > if (conn->cleanup) > > > > > conn->cleanup(conn); > > > > > > > > > > > > > > > > > > > > > > > > > @@ -929,29 +932,17 @@ static void cis_cleanup(struct hci_conn *conn) > > > > > hci_le_remove_cig(hdev, conn->iso_qos.ucast.cig); > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > -static u16 hci_conn_hash_alloc_unset(struct hci_dev *hdev) > > > > > +static int hci_conn_hash_alloc_unset(struct hci_dev *hdev) > > > > > { > > > > > - struct hci_conn_hash *h = &hdev->conn_hash; > > > > > - struct hci_conn *c; > > > > > - u16 handle = HCI_CONN_HANDLE_MAX + 1; > > > > > - > > > > > - rcu_read_lock(); > > > > > - > > > > > - list_for_each_entry_rcu(c, &h->list, list) { > > > > > - /* Find the first unused handle */ > > > > > - if (handle == 0xffff || c->handle != handle) > > > > > - break; > > > > > - handle++; > > > > > - } > > > > > - rcu_read_unlock(); > > > > > > > > The original hci_conn_hash_alloc_unset seems to have wrong logic. It'll > > > > e.g. always return HCI_CONN_HANDLE_MAX+1 except if the first conn in > > > > conn_hash (which is not in particular order) has that handle... > > > > > > > > The code paths adding/deleting conns or entering here / setting handles > > > > should be holding hdev->lock, so probably no concurrency issue in it. > > > > > > > > Is syzbot happy with just this handle allocation fixed? > > > > > > Just fix handle, hci_conn occur UAF in hci_proto_disconn_ind() within hci_conn_timeout() process. > > > > > > > > > > > > - > > > > > - return handle; > > > > > + return ida_alloc_range(&hdev->unset_handle_ida, HCI_CONN_HANDLE_MAX + 1, > > > > > + U16_MAX, GFP_ATOMIC); > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Pauli Virtanen > > > -- Pauli Virtanen