Hi Cheng, On Mon, Dec 16, 2024 at 9:32 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote: > > Hi Cheng, > > On Mon, Dec 16, 2024 at 3:08 AM Cheng Jiang <quic_chejiang@xxxxxxxxxxx> wrote: > > > > Sometimes, the remote device doesn't acknowledge the LL_TERMINATE_IND > > in time, requiring the controller to wait for the supervision timeout, > > which may exceed 2 seconds. In the current implementation, the > > HCI_EV_DISCONN_COMPLETE event is ignored if it arrives late, since > > the hci_abort_conn_sync has cleaned up the connection after 2 seconds. > > This causes the mgmt to get stuck, resulting in bluetoothd waiting > > indefinitely for the mgmt response to the disconnect. To recover, > > restarting bluetoothd is necessary. > > > > bluetoothctl log like this: > > [Designer Mouse]# disconnect D9:B5:6C:F2:51:91 > > Attempting to disconnect from D9:B5:6C:F2:51:91 > > [Designer Mouse]# > > [Designer Mouse]# power off > > [Designer Mouse]# > > Failed to set power off: org.freedesktop.DBus.Error.NoReply. > > > > Signed-off-by: Cheng Jiang <quic_chejiang@xxxxxxxxxxx> > > --- > > include/net/bluetooth/hci_core.h | 2 ++ > > net/bluetooth/hci_conn.c | 9 +++++++++ > > net/bluetooth/hci_event.c | 9 +++++++++ > > net/bluetooth/hci_sync.c | 18 ++++++++++++++++++ > > 4 files changed, 38 insertions(+) > > > > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h > > index 734cd50cd..2ab079dcf 100644 > > --- a/include/net/bluetooth/hci_core.h > > +++ b/include/net/bluetooth/hci_core.h > > @@ -753,6 +753,8 @@ struct hci_conn { > > > > struct bt_codec codec; > > > > + struct completion disc_ev_comp; > > + > > void (*connect_cfm_cb) (struct hci_conn *conn, u8 status); > > void (*security_cfm_cb) (struct hci_conn *conn, u8 status); > > void (*disconn_cfm_cb) (struct hci_conn *conn, u8 reason); > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c > > index d097e308a..e0244e191 100644 > > --- a/net/bluetooth/hci_conn.c > > +++ b/net/bluetooth/hci_conn.c > > @@ -1028,6 +1028,15 @@ static struct hci_conn *__hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t > > > > hci_conn_init_sysfs(conn); > > > > + /* This disc_ev_comp is inited when we send a disconnect request to > > + * the remote device but fail to receive the disconnect complete > > + * event within the expected time (2 seconds). This occurs because > > + * the remote device doesn't ack the terminate indication, forcing > > + * the controller to wait for the supervision timeout. > > + */ > > + init_completion(&conn->disc_ev_comp); > > + complete(&conn->disc_ev_comp); > > + > > return conn; > > } > > > > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c > > index 2cc7a9306..60ecb2b18 100644 > > --- a/net/bluetooth/hci_event.c > > +++ b/net/bluetooth/hci_event.c > > @@ -3366,6 +3366,15 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data, > > if (!conn) > > goto unlock; > > > > + /* Wake up disc_ev_comp here is ok. Since we hold the hdev lock > > + * hci_abort_conn_sync will wait hdev lock release to continue. > > + */ > > + if (!completion_done(&conn->disc_ev_comp)) { > > + complete(&conn->disc_ev_comp); > > + /* Add some delay for hci_abort_conn_sync to handle the complete */ > > + usleep_range(100, 1000); > > + } > > + > > if (ev->status) { > > mgmt_disconnect_failed(hdev, &conn->dst, conn->type, > > conn->dst_type, ev->status); > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > index 0badec712..783d04b57 100644 > > --- a/net/bluetooth/hci_sync.c > > +++ b/net/bluetooth/hci_sync.c > > @@ -5590,6 +5590,24 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason) > > break; > > } > > > > + /* Check whether the connection is successfully disconnected. > > + * Sometimes the remote device doesn't acknowledge the > > + * LL_TERMINATE_IND in time, requiring the controller to wait > > + * for the supervision timeout, which may exceed 2 seconds. In > > + * this case, we need to wait for the HCI_EV_DISCONN_COMPLETE > > + * event before cleaning up the connection. > > + */ > > + if (err == -ETIMEDOUT) { > > + u32 idle_delay = msecs_to_jiffies(10 * conn->le_supv_timeout); > > + > > + reinit_completion(&conn->disc_ev_comp); > > + if (!wait_for_completion_timeout(&conn->disc_ev_comp, idle_delay)) { > > + bt_dev_warn(hdev, "Failed to get complete"); > > + mgmt_disconnect_failed(hdev, &conn->dst, conn->type, > > + conn->dst_type, conn->abort_reason); > > + } > > + } > > Why don't we just set the supervision timeout as timeout then? If we > will have to wait for it anyway just change hci_disconnect_sync to use > 10 * conn->le_supv_timeout as timeout instead. > > That said, we really need to fix bluetoothd if it is not able to be > cleaned up if SET_POWERED command fails, but it looks like it is > handling errors correctly so it sounds like something else is at play. I double checked this and apparently this could no longer fail: + /* Disregard possible errors since hci_conn_del shall have been + * called even in case of errors had occurred since it would + * then cause hci_conn_failed to be called which calls + * hci_conn_del internally. + */ + hci_abort_conn_sync(hdev, conn, reason); So it will clean up the hci_conn no matter what is the timeout, so either you don't have this change or it is not working for some reason. > > hci_dev_lock(hdev); > > > > /* Check if the connection has been cleaned up concurrently */ > > > > base-commit: e25c8d66f6786300b680866c0e0139981273feba > > -- > > 2.34.1 > > > > > -- > Luiz Augusto von Dentz -- Luiz Augusto von Dentz