On Thu, 2021-11-04 at 09:28 -0400, Alan Stern wrote: > On Thu, Nov 04, 2021 at 10:34:22AM +0100, Benjamin Berg wrote: > > Hi Marcel and Alan, > > > > On Wed, 2021-11-03 at 20:31 +0100, Marcel Holtmann wrote: > > > > I'm not familiar with the btusb driver, so someone on the > > > > linux-bluetooth mailing list would have a better idea about this. > > > > However, it does look as though btusb keeps the device locked during the > > > > entire 10-second period while it tries to send over the firmware, and it > > > > doesn't abort the procedure when it starts getting disconnection errors > > > > but instead persists until a timeout expires. Keeping the device locked > > > > would certainly block lsusb. > > > > > > > > In general, locking the device during a firmware upload seems like > > > > the right thing to do -- you don't want extraneous transfers from > > > > other processes messing up the firmware! So overall, it appears that > > > > the whole problem would be solved if the firmware transfer were > > > > aborted as soon as the -ENODEV errors start appearing. > > > > > > the problem seems to be that we hitting HCI command timeout. So the > > > firmware download is done via HCI commands. These commands are send > > > to the transport driver btusb.c via hdev->send (as btusb_send_frame). > > > This triggers the usb_submit_urb or queues them via data->deferred > > > anchor. All this reports back the error properly except that nobody > > > does anything with it. > > > > > > See hci_send_frame() last portion: > > > > > > err = hdev->send(hdev, skb); > > > if (err < 0) { > > > bt_dev_err(hdev, "sending frame failed (%d)", err); > > > kfree_skb(skb); > > > } > > > > > > And that is it. We are not checking for ENODEV or any error here. > > > That means the failure of the HCI command gets only caught via the > > > HCI command timeout. I don’t know how to do this yet, but you would > > > have to look there to fail HCI command right away instead of waiting > > > for the timeout. > > > > Hmm, true, I don't see a "sending frame failed" error message during > > the firmware download though. > > It is in the log you posted: > > [Mi Nov 3 11:55:23 2021] Bluetooth: hci0: Failed to send firmware data (-110) > [Mi Nov 3 11:55:23 2021] Bluetooth: hci0: sending frame failed (-19) > > But this occurred after the timeout, so maybe you had in mind something > occurring earlier. Yep, that one happens when the driver tries to reset the device. We need to catch the error earlier in order to avoid the 10s wait. > > You are right that this codepath is > > loosing the error, but this does not seem to be the scenario we are > > running into while loading the firmware. This error only happens later > > on from the btintel_reset_to_bootloader function. > > > > What seems to happen in the posted log is that the URB is initially > > submitted just fine and the transfer errors out afterwards. > > Unfortunately, the btusb_tx_complete is only used for statistics > > (stat.err_tx is increased) and has no further error handling that could > > abort the firmware upload. > > While detecting the errors during URB completion would be nice, it isn't > necessary. Things would work just as well if the disconnect error were > detected during submission of the following URB. Ah, good point. The in-flight interrupt URB responsible of retrieving the event from the device will fail. It should be sufficient to inject a hardware error event at that point in order to fix this. As such, a simple solution may be to call hci_reset_dev from inside btusb_intr_complete and btusb_submit_intr_urb when the URB submission fails (for "err != -EPERM"). Benjamin
Attachment:
signature.asc
Description: This is a digitally signed message part