Hi Cheng, On Mon, Dec 16, 2024 at 3:08 AM Cheng Jiang <quic_chejiang@xxxxxxxxxxx> wrote: > > Sometimes, the remote device doesn't acknowledge the LL_TERMINATE_IND > in time, requiring the controller to wait for the supervision timeout, > which may exceed 2 seconds. In the current implementation, the > HCI_EV_DISCONN_COMPLETE event is ignored if it arrives late, since > the hci_abort_conn_sync has cleaned up the connection after 2 seconds. > This causes the mgmt to get stuck, resulting in bluetoothd waiting > indefinitely for the mgmt response to the disconnect. To recover, > restarting bluetoothd is necessary. > > bluetoothctl log like this: > [Designer Mouse]# disconnect D9:B5:6C:F2:51:91 > Attempting to disconnect from D9:B5:6C:F2:51:91 > [Designer Mouse]# > [Designer Mouse]# power off > [Designer Mouse]# > Failed to set power off: org.freedesktop.DBus.Error.NoReply. > > Signed-off-by: Cheng Jiang <quic_chejiang@xxxxxxxxxxx> > --- > include/net/bluetooth/hci_core.h | 2 ++ > net/bluetooth/hci_conn.c | 9 +++++++++ > net/bluetooth/hci_event.c | 9 +++++++++ > net/bluetooth/hci_sync.c | 18 ++++++++++++++++++ > 4 files changed, 38 insertions(+) > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h > index 734cd50cd..2ab079dcf 100644 > --- a/include/net/bluetooth/hci_core.h > +++ b/include/net/bluetooth/hci_core.h > @@ -753,6 +753,8 @@ struct hci_conn { > > struct bt_codec codec; > > + struct completion disc_ev_comp; > + > void (*connect_cfm_cb) (struct hci_conn *conn, u8 status); > void (*security_cfm_cb) (struct hci_conn *conn, u8 status); > void (*disconn_cfm_cb) (struct hci_conn *conn, u8 reason); > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c > index d097e308a..e0244e191 100644 > --- a/net/bluetooth/hci_conn.c > +++ b/net/bluetooth/hci_conn.c > @@ -1028,6 +1028,15 @@ static struct hci_conn *__hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t > > hci_conn_init_sysfs(conn); > > + /* This disc_ev_comp is inited when we send a disconnect request to > + * the remote device but fail to receive the disconnect complete > + * event within the expected time (2 seconds). This occurs because > + * the remote device doesn't ack the terminate indication, forcing > + * the controller to wait for the supervision timeout. > + */ > + init_completion(&conn->disc_ev_comp); > + complete(&conn->disc_ev_comp); > + > return conn; > } > > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c > index 2cc7a9306..60ecb2b18 100644 > --- a/net/bluetooth/hci_event.c > +++ b/net/bluetooth/hci_event.c > @@ -3366,6 +3366,15 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data, > if (!conn) > goto unlock; > > + /* Wake up disc_ev_comp here is ok. Since we hold the hdev lock > + * hci_abort_conn_sync will wait hdev lock release to continue. > + */ > + if (!completion_done(&conn->disc_ev_comp)) { > + complete(&conn->disc_ev_comp); > + /* Add some delay for hci_abort_conn_sync to handle the complete */ > + usleep_range(100, 1000); > + } > + > if (ev->status) { > mgmt_disconnect_failed(hdev, &conn->dst, conn->type, > conn->dst_type, ev->status); > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > index 0badec712..783d04b57 100644 > --- a/net/bluetooth/hci_sync.c > +++ b/net/bluetooth/hci_sync.c > @@ -5590,6 +5590,24 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason) > break; > } > > + /* Check whether the connection is successfully disconnected. > + * Sometimes the remote device doesn't acknowledge the > + * LL_TERMINATE_IND in time, requiring the controller to wait > + * for the supervision timeout, which may exceed 2 seconds. In > + * this case, we need to wait for the HCI_EV_DISCONN_COMPLETE > + * event before cleaning up the connection. > + */ > + if (err == -ETIMEDOUT) { > + u32 idle_delay = msecs_to_jiffies(10 * conn->le_supv_timeout); > + > + reinit_completion(&conn->disc_ev_comp); > + if (!wait_for_completion_timeout(&conn->disc_ev_comp, idle_delay)) { > + bt_dev_warn(hdev, "Failed to get complete"); > + mgmt_disconnect_failed(hdev, &conn->dst, conn->type, > + conn->dst_type, conn->abort_reason); > + } > + } Why don't we just set the supervision timeout as timeout then? If we will have to wait for it anyway just change hci_disconnect_sync to use 10 * conn->le_supv_timeout as timeout instead. That said, we really need to fix bluetoothd if it is not able to be cleaned up if SET_POWERED command fails, but it looks like it is handling errors correctly so it sounds like something else is at play. > hci_dev_lock(hdev); > > /* Check if the connection has been cleaned up concurrently */ > > base-commit: e25c8d66f6786300b680866c0e0139981273feba > -- > 2.34.1 > -- Luiz Augusto von Dentz