Hi Manish, > During command status or command complete event, the controller may set > ncmd=0 indicating that it is not accepting any more commands. In such a > case, host holds off sending any more commands to the controller. If the > controller doesn't recover from such condition, host will wait forever. > > This patch adds a timer when controller gets into such condition and > resets the controller if controller doesn't recover within the timeout > period. > > Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@xxxxxxxxxxxx> > Signed-off-by: Manish Mandlik <mmandlik@xxxxxxxxxx> > --- > Hello Maintainers, > > We noticed that during suspend, sometimes the controller firmware gets > into a state where it is not accepting any more commands (it returns > ncmd=0 in Command Status): > > < HCI Command: Disconnect (0x01|0x0006) plen 3 #398 [hci0] 83.760502 > Handle: 1 > Reason: Remote Device Terminated due to Power Off (0x15) >> HCI Event: Command Status (0x0f) plen 4 #399 [hci0] 83.761694 > Disconnect (0x01|0x0006) ncmd 0 > Status: Success (0x00) > > In such a case, the host holds off sending any more packets to the > controller until it is ready to accept more commands. If the controller > doesn't recover from such a condition, Command Timeout does not get > triggered as Command Timeout is queued only once the packet is sent to > the controller; hence, the host will wait forever. > > This patch adds a timer to recover from this condition. Since the > suspend timeout is 2 seconds, I'm using 4 seconds timeout to recover > from ncmd=0. This should give ample amount of time for recovery and > should not create any race conditions with the suspend. Once we resume > from the suspend normally, the timer would expire and reset the > controller. I have verified this patch locally and able to connect to > peer device after resume from suspend. Please let me know your thoughts > on this. > > Thanks, > Manish. > > include/net/bluetooth/hci.h | 1 + > include/net/bluetooth/hci_core.h | 1 + > net/bluetooth/hci_core.c | 15 +++++++++++++++ > net/bluetooth/hci_event.c | 10 ++++++++++ > 4 files changed, 27 insertions(+) > > diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h > index ea4ae551c426..c4b0650fb9ae 100644 > --- a/include/net/bluetooth/hci.h > +++ b/include/net/bluetooth/hci.h > @@ -339,6 +339,7 @@ enum { > #define HCI_PAIRING_TIMEOUT msecs_to_jiffies(60000) /* 60 seconds */ > #define HCI_INIT_TIMEOUT msecs_to_jiffies(10000) /* 10 seconds */ > #define HCI_CMD_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */ > +#define HCI_NCMD_TIMEOUT msecs_to_jiffies(4000) /* 4 seconds */ > #define HCI_ACL_TX_TIMEOUT msecs_to_jiffies(45000) /* 45 seconds */ > #define HCI_AUTO_OFF_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */ > #define HCI_POWER_OFF_TIMEOUT msecs_to_jiffies(5000) /* 5 seconds */ > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h > index ebdd4afe30d2..f14692b39fd5 100644 > --- a/include/net/bluetooth/hci_core.h > +++ b/include/net/bluetooth/hci_core.h > @@ -470,6 +470,7 @@ struct hci_dev { > struct delayed_work service_cache; > > struct delayed_work cmd_timer; > + struct delayed_work ncmd_timer; > > struct work_struct rx_work; > struct work_struct cmd_work; > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index b0d9c36acc03..5ee1609456bd 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -2769,6 +2769,20 @@ static void hci_cmd_timeout(struct work_struct *work) > queue_work(hdev->workqueue, &hdev->cmd_work); > } > > +/* HCI ncmd timer function */ > +static void hci_ncmd_timeout(struct work_struct *work) > +{ > + struct hci_dev *hdev = container_of(work, struct hci_dev, > + ncmd_timer.work); > + > + bt_dev_err(hdev, "ncmd timeout"); > + > + if (hci_dev_do_close(hdev)) > + return; > + > + hci_dev_do_open(hdev); > +} > + I am pretty certain this can dead-lock if ncmd=0 happens inside hci_dev_do_open,do_close itself. The second thing is that do_close+do_open is heavy hammer you are swinging here. It will also result in mgmt powered down/up. Is this something you really want since bluetoothd will notice this and has to re-init everything. Regards Marcel