Hi Marcel, > From: Marcel Holtmann [mailto:marcel@xxxxxxxxxxxx] > Sent: Wednesday, November 23, 2016 1:46 PM > To: Amitkumar Karwar > Cc: linux-bluetooth@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > Cathy Luo; Nishant Sarmukadam; Ganapathi Bhat > Subject: Re: [PATCH] btusb: fix zero BD address problem during stress > test > > Hi Amitkumar, > > >>>> From: Amitkumar Karwar [mailto:akarwar@xxxxxxxxxxx] > >>>> Sent: Tuesday, October 18, 2016 6:27 PM > >>>> To: linux-bluetooth@xxxxxxxxxxxxxxx > >>>> Cc: marcel@xxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Cathy Luo; > >>>> Nishant Sarmukadam; Ganapathi Bhat; Amitkumar Karwar > >>>> Subject: [PATCH] btusb: fix zero BD address problem during stress > >>>> test > >>>> > >>>> From: Ganapathi Bhat <gbhat@xxxxxxxxxxx> > >>>> > >>>> We came across a corner case issue during reboot stress test in > >> which > >>>> hciconfig shows BD address is all zero. Reason is we don't get > >>>> response for HCI RESET command during initialization > >>>> > >>>> The issue is tracked to a race where USB subsystem calls > >>>> btusb_intr_complete() to deliver a data(NOOP frame) received on > >>>> interrupt endpoint. HCI_RUNNING flag is not yet set by bluetooth > >>>> subsystem. So we ignore that frame and return. > >>>> > >>>> As we missed to resubmit the buffer to interrupt endpoint in this > >>>> case, we don't get response for BT reset command downloaded after > >> this. > >>>> > >>>> This patch handles the corner case to resolve zero BD address > >> problem. > >>>> > >>>> Signed-off-by: Ganapathi Bhat <gbhat@xxxxxxxxxxx> > >>>> Signed-off-by: Amitkumar Karwar <akarwar@xxxxxxxxxxx> > >>>> --- > >>>> drivers/bluetooth/btusb.c | 5 +---- > >>>> 1 file changed, 1 insertion(+), 4 deletions(-) > >>>> > >>>> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c > >>>> index 811f9b9..b5596ac 100644 > >>>> --- a/drivers/bluetooth/btusb.c > >>>> +++ b/drivers/bluetooth/btusb.c > >>>> @@ -607,10 +607,7 @@ static void btusb_intr_complete(struct urb > >> *urb) > >>>> BT_DBG("%s urb %p status %d count %d", hdev->name, urb, urb- > >>>>> status, > >>>> urb->actual_length); > >>>> > >>>> - if (!test_bit(HCI_RUNNING, &hdev->flags)) > >>>> - return; > >>>> - > >>>> - if (urb->status == 0) { > >>>> + if (urb->status == 0 && test_bit(HCI_RUNNING, &hdev- > >flags)) { > >>>> hdev->stat.byte_rx += urb->actual_length; > >>>> > >>>> if (btusb_recv_intr(data, urb->transfer_buffer, > >>> > >>> Did you get a chance to check this? > >>> Please let us know if you have any review comments. > >> > >> can you explain how this is correct and show me the HCI traces for > >> this. > >> > > > > I suppose HCI trace means hcidump logs here. As device hasn't yet > initialized, hcidump won't show anything. > > We had added debug info in btusb driver to trace the data received on > all USB endpoints and also checked usbmon logs. > > use btmon and it will show it. > > > Here is the sequence of events we observed in a corner case while > running stress test. > > 1) Inside btusb_open() call ------ Thread 1 > > 2) btusb_submit_intr_urb() submits the URB for receiving data on > > interrupt endpoint ---- Thread 1 > > 3) btusb_intr_complete() gets called to deliver NOP frame from HCI > > controller ---- Thread 2 > > 4) HCI_RUNNING isn't set yet. So we return from btusb_intr_complete() > > without resubmitting the buffer --- Thread 2 > > 5) Exit btusb_open() ---- Thread 1 > > 6) "set_bit(HCI_RUNNING, &hdev->flags)" done by bluetooth core ---- > > Thread 1 > > > > Later HCI_RESET command gets timedout, as we haven't re-submitted > buffer for interrupt endpoint in step (4) above. > > > > Please find attached logs. usbmon log shows first frame received on > interrupt endpoint is NOP(Search for Marvell in log). > > > > Here is what bluetooth spec says about NOP frame. > > > > "To indicate to the Host that the Controller is ready to receive HCI > > command packets, the Controller generates a Command Status event with > > Status 0x00 and Command_Opcode 0x0000, and the > Num_HCI_Command_Packets > > event parameter is set to 1 or more. Command_Opcode, 0x0000 is a NOP > (No Operation)” > > So I wonder if we need to remove the HCI_RUNNING logic from the > drivers. It is only left in a few USB drivers and I removed the rest > and moved it into the core. I am not in favour of papering over this > issue. I need to understand what is wrong. And actually HCI_RUNNING > needs to be taken away from the drivers. So the question is if btusb.c > still needs it or if that is just a leftover. Meaning is it protecting > anything? > I can see HCI_RUNNING check being used at below places in btusb.c 1) btusb_intr_complete 2) btusb_bulk_complete 3) btusb_isoc_complete 4) btusb_tx_complete 5) btusb_isoc_tx_complete 6) btusb_bcm_set_diag 7) btusb_resume I think, it can be removed from (1), (2) and (3), as we already have below check inside hci_recv_frame() if (!hdev || (!test_bit(HCI_UP, &hdev->flags) && !test_bit(HCI_INIT, &hdev->flags))) { kfree_skb(skb); return -ENXIO; } For (4) and (5), it's just used to decide if we should update "hdev->stat" or not. Regards, Amitkumar ��.n��������+%������w��{.n�����{����^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�