Hi Manish, On Thu, Aug 12, 2021 at 3:58 AM K, Kiran <kiran.k@xxxxxxxxx> wrote: > > Hi Marcel, > > > -----Original Message----- > > From: K, Kiran > > Sent: Friday, August 6, 2021 8:14 PM > > To: 'Marcel Holtmann' <marcel@xxxxxxxxxxxx> > > Cc: BlueZ <linux-bluetooth@xxxxxxxxxxxxxxx>; Srivatsa, Ravishankar > > <ravishankar.srivatsa@xxxxxxxxx>; Tumkur Narayan, Chethan > > <chethan.tumkur.narayan@xxxxxxxxx> > > Subject: RE: [PATCH v1] Bluetooth: Fix race condition in handling NOP > > command > > > > Hi Marcel, > > > > > -----Original Message----- > > > From: Marcel Holtmann <marcel@xxxxxxxxxxxx> > > > Sent: Thursday, August 5, 2021 6:41 PM > > > To: K, Kiran <kiran.k@xxxxxxxxx> > > > Cc: BlueZ <linux-bluetooth@xxxxxxxxxxxxxxx>; Srivatsa, Ravishankar > > > <ravishankar.srivatsa@xxxxxxxxx>; Tumkur Narayan, Chethan > > > <chethan.tumkur.narayan@xxxxxxxxx> > > > Subject: Re: [PATCH v1] Bluetooth: Fix race condition in handling NOP > > > command > > > > > > Hi Kiran, > > > > > > > For NOP command, need to cancel work scheduled on cmd_timer, on > > > > receiving command status or commmand complete event. > > > > > > > > Below use case might lead to race condition multiple when NOP > > > > commands are queued sequentially: > > > > > > > > hci_cmd_work() { > > > > if (atomic_read(&hdev->cmd_cnt) { > > > > . > > > > . > > > > . > > > > atomic_dec(&hdev->cmd_cnt); > > > > hci_send_frame(hdev,...); > > > > schedule_delayed_work(&hdev->cmd_timer,...); > > > > } > > > > } > > > > > > > > On receiving event for first NOP, the work scheduled on > > > > hdev->cmd_timer is not cancelled and second NOP is dequeued and > > > > hdev->sent > > > to controller. > > > > > > > > While waiting for an event for second NOP command, work scheduled on > > > > cmd_timer for first NOP can get scheduled, resulting in sending > > > > third NOP command not waiting for an event for second NOP. This > > > > might cause issues at controller side (like memory overrun, > > > > controller going > > > > unresponsive) resulting in hci tx timeouts, hardware errors etc. > > > > > > > > Signed-off-by: Kiran K <kiran.k@xxxxxxxxx> > > > > Reviewed-by: Chethan T N <chethan.tumkur.narayan@xxxxxxxxx> > > > > Reviewed-by: Srivatsa Ravishankar <ravishankar.srivatsa@xxxxxxxxx> > > > > --- > > > > net/bluetooth/hci_event.c | 3 +-- > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c > > > > index ea7fc09478be..14dfbdc8b81b 100644 > > > > --- a/net/bluetooth/hci_event.c > > > > +++ b/net/bluetooth/hci_event.c > > > > @@ -3271,8 +3271,7 @@ static void hci_remote_features_evt(struct > > > > hci_dev *hdev, static inline void handle_cmd_cnt_and_timer(struct > > > > hci_dev > > > *hdev, > > > > u16 opcode, u8 ncmd) > > > > { > > > > - if (opcode != HCI_OP_NOP) > > > > - cancel_delayed_work(&hdev->cmd_timer); > > > > + cancel_delayed_work(&hdev->cmd_timer); > > > > > > > > if (!test_bit(HCI_RESET, &hdev->flags)) { > > > > if (ncmd) { > > > > > > so this is conflicting with the patch introducing the ncmd timeout handling. > > > > > My patch specifically addresses the issue observed in case of NOP command. > > It prevents the issue by handling NOP same as any other SIG command. > > > > It looks commit de75cd0d9b2f3250d5f25846bb5632ccce6275f4 tries to > > recover when controller goes bad. > > > > Do you have any further comments here ? Waiting for your input. > > > > commit de75cd0d9b2f3250d5f25846bb5632ccce6275f4 > > > Author: Manish Mandlik <mmandlik@xxxxxxxxxx> > > > Date: Thu Apr 29 10:24:22 2021 -0700 > > > > > > Bluetooth: Add ncmd=0 recovery handling > > > > > > During command status or command complete event, the controller > > > may set > > > ncmd=0 indicating that it is not accepting any more commands. In such a > > > case, host holds off sending any more commands to the controller. If the > > > controller doesn't recover from such condition, host will wait forever, > > > until the user decides that the Bluetooth is broken and may power cycles > > > the Bluetooth. > > > > > > This patch triggers the hardware error to reset the controller and > > > driver when it gets into such state as there is no other wat out. > > > > > > Nowhere in your commit description you are addressing why is this the > > > right to do. > > > > > > > Will fix it in the next version if you are OK with the current fix. Please let me > > know. Can you confirm this change doesn't break your patch above? > > > > > Regards > > > > > > Marcel > > > > Thanks, > > Kiran > > Thanks, > Kiran > > -- Luiz Augusto von Dentz