Hi Luiz, On 12/16/2024 10:42 PM, Luiz Augusto von Dentz wrote: > Hi Cheng, > > On Mon, Dec 16, 2024 at 9:32 AM Luiz Augusto von Dentz > <luiz.dentz@xxxxxxxxx> wrote: >> >> Hi Cheng, >> >> On Mon, Dec 16, 2024 at 3:08 AM Cheng Jiang <quic_chejiang@xxxxxxxxxxx> wrote: >>> >>> Sometimes, the remote device doesn't acknowledge the LL_TERMINATE_IND >>> in time, requiring the controller to wait for the supervision timeout, >>> which may exceed 2 seconds. In the current implementation, the >>> HCI_EV_DISCONN_COMPLETE event is ignored if it arrives late, since >>> the hci_abort_conn_sync has cleaned up the connection after 2 seconds. >>> This causes the mgmt to get stuck, resulting in bluetoothd waiting >>> indefinitely for the mgmt response to the disconnect. To recover, >>> restarting bluetoothd is necessary. >>> >>> bluetoothctl log like this: >>> [Designer Mouse]# disconnect D9:B5:6C:F2:51:91 >>> Attempting to disconnect from D9:B5:6C:F2:51:91 >>> [Designer Mouse]# >>> [Designer Mouse]# power off >>> [Designer Mouse]# >>> Failed to set power off: org.freedesktop.DBus.Error.NoReply. >>> >>> Signed-off-by: Cheng Jiang <quic_chejiang@xxxxxxxxxxx> >>> --- >>> include/net/bluetooth/hci_core.h | 2 ++ >>> net/bluetooth/hci_conn.c | 9 +++++++++ >>> net/bluetooth/hci_event.c | 9 +++++++++ >>> net/bluetooth/hci_sync.c | 18 ++++++++++++++++++ >>> 4 files changed, 38 insertions(+) >>> >>> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h >>> index 734cd50cd..2ab079dcf 100644 >>> --- a/include/net/bluetooth/hci_core.h >>> +++ b/include/net/bluetooth/hci_core.h >>> @@ -753,6 +753,8 @@ struct hci_conn { >>> >>> struct bt_codec codec; >>> >>> + struct completion disc_ev_comp; >>> + >>> void (*connect_cfm_cb) (struct hci_conn *conn, u8 status); >>> void (*security_cfm_cb) (struct hci_conn *conn, u8 status); >>> void (*disconn_cfm_cb) (struct hci_conn *conn, u8 reason); >>> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c >>> index d097e308a..e0244e191 100644 >>> --- a/net/bluetooth/hci_conn.c >>> +++ b/net/bluetooth/hci_conn.c >>> @@ -1028,6 +1028,15 @@ static struct hci_conn *__hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t >>> >>> hci_conn_init_sysfs(conn); >>> >>> + /* This disc_ev_comp is inited when we send a disconnect request to >>> + * the remote device but fail to receive the disconnect complete >>> + * event within the expected time (2 seconds). This occurs because >>> + * the remote device doesn't ack the terminate indication, forcing >>> + * the controller to wait for the supervision timeout. >>> + */ >>> + init_completion(&conn->disc_ev_comp); >>> + complete(&conn->disc_ev_comp); >>> + >>> return conn; >>> } >>> >>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c >>> index 2cc7a9306..60ecb2b18 100644 >>> --- a/net/bluetooth/hci_event.c >>> +++ b/net/bluetooth/hci_event.c >>> @@ -3366,6 +3366,15 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data, >>> if (!conn) >>> goto unlock; >>> >>> + /* Wake up disc_ev_comp here is ok. Since we hold the hdev lock >>> + * hci_abort_conn_sync will wait hdev lock release to continue. >>> + */ >>> + if (!completion_done(&conn->disc_ev_comp)) { >>> + complete(&conn->disc_ev_comp); >>> + /* Add some delay for hci_abort_conn_sync to handle the complete */ >>> + usleep_range(100, 1000); >>> + } >>> + >>> if (ev->status) { >>> mgmt_disconnect_failed(hdev, &conn->dst, conn->type, >>> conn->dst_type, ev->status); >>> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c >>> index 0badec712..783d04b57 100644 >>> --- a/net/bluetooth/hci_sync.c >>> +++ b/net/bluetooth/hci_sync.c >>> @@ -5590,6 +5590,24 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason) >>> break; >>> } >>> >>> + /* Check whether the connection is successfully disconnected. >>> + * Sometimes the remote device doesn't acknowledge the >>> + * LL_TERMINATE_IND in time, requiring the controller to wait >>> + * for the supervision timeout, which may exceed 2 seconds. In >>> + * this case, we need to wait for the HCI_EV_DISCONN_COMPLETE >>> + * event before cleaning up the connection. >>> + */ >>> + if (err == -ETIMEDOUT) { >>> + u32 idle_delay = msecs_to_jiffies(10 * conn->le_supv_timeout); >>> + >>> + reinit_completion(&conn->disc_ev_comp); >>> + if (!wait_for_completion_timeout(&conn->disc_ev_comp, idle_delay)) { >>> + bt_dev_warn(hdev, "Failed to get complete"); >>> + mgmt_disconnect_failed(hdev, &conn->dst, conn->type, >>> + conn->dst_type, conn->abort_reason); >>> + } >>> + } >> >> Why don't we just set the supervision timeout as timeout then? If we >> will have to wait for it anyway just change hci_disconnect_sync to use >> 10 * conn->le_supv_timeout as timeout instead. >> hci_disconnect_sync uses __hci_cmd_sync_status_sk to wait for the HCI_EV_DISCONN_COMPLETE event, which will send the command in hci_cmd_work. In hci_cmd_work, it will start a timer with HCI_CMD_TIMEOUT(2s) to wait for the event. So even in hci_disconnect_sync we can set the timeout to supervision timeout, hci_disconnect_sync still timeout after 2s. >> That said, we really need to fix bluetoothd if it is not able to be >> cleaned up if SET_POWERED command fails, but it looks like it is >> handling errors correctly so it sounds like something else is at play. > The issue arises after a 2-second timeout of hci_disconnect_sync. During hci_abort_conn_sync, the connection is cleaned up by hci_conn_failed. after supervision timeout, the disconnect complete event arrives, but it returns at line 3370 since the connection has already been removed. As a result, bluetoothd does not send the mgmt event for MGMT_OP_DISCONNECT to the application layer (bluetoothctl), causing bluetoothctl to get stuck and unable to perform other mgmt commands. 3355 static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data, 3356 struct sk_buff *skb) 3357 { 3358 struct hci_ev_disconn_complete *ev = data; 3359 u8 reason; 3360 struct hci_conn_params *params; 3361 struct hci_conn *conn; 3362 bool mgmt_connected; 3363 3364 bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); 3365 3366 hci_dev_lock(hdev); 3367 3368 conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); 3369 if (!conn) 3370 goto unlock; 3371 3372 if (ev->status) { 3373 mgmt_disconnect_failed(hdev, &conn->dst, conn->type, 3374 conn->dst_type, ev->status); 3375 goto unlock; 3376 } 3377 3378 conn->state = BT_CLOSED; 3379 3380 mgmt_connected = test_and_clear_bit(HCI_CONN_MGMT_CONNECTED, &conn->flags); 3381 > I double checked this and apparently this could no longer fail: > > + /* Disregard possible errors since hci_conn_del shall have been > + * called even in case of errors had occurred since it would > + * then cause hci_conn_failed to be called which calls > + * hci_conn_del internally. > + */ > + hci_abort_conn_sync(hdev, conn, reason); > > So it will clean up the hci_conn no matter what is the timeout, so > either you don't have this change or it is not working for some > reason. > The issue is mgmt command is not repsonsed by bluetoothd, then the bluetootlctl is blocked. It does not happen during the power off stage. It happened when disconnect a BLE device, but the disconnect complete event sent from controller to host 2s later. Then it causes the mgmt in bluetoothctl is blocked as decribed as above. >>> hci_dev_lock(hdev); >>> >>> /* Check if the connection has been cleaned up concurrently */ >>> >>> base-commit: e25c8d66f6786300b680866c0e0139981273feba >>> -- >>> 2.34.1 >>> >> >> >> -- >> Luiz Augusto von Dentz > > >