On Wed, Nov 03, 2021 at 08:31:03PM +0100, Marcel Holtmann wrote: > Hi Alan, > > >> a user is seeing a hang in fprintd while enumerating devices which > >> appears to be caused by an interaction of: > >> > >> * system is resuming from S3 > >> * btusb starts loading firmware > >> * bluetooth device disappears (probably thinkpad_acpi rfkill) > >> * libusb enumerates USB devices (fprintd in this case) > >> > >> When this happens, the firmware load fails after a timeout of 10s. It > >> appears that if userspace queries information about the root hub in > >> question during this time, it will hang until the btusb firmware load > >> has timed out. > >> > >> Attaching the full kernel log, below an excerpt, you can see: > >> * At :12 device removal: "usb 5-4: USB disconnect, device number 33" > >> * libusb enumeration retrieves information about the usb5 root hub, > >> and blocks on this > >> * At :14 there is a tx timeout on hci0 > >> * At :23 the firmware load finally fails > >> * Then usb_disable_device happens > >> * libusb/fprintd gets the usb5 HUB information and continues its > >> enumeration > >> > >> As I see it, there may be two issues: > >> 1. userspace should not block due to the firmware load hanging > >> 2. btusb should give up more quickly when the device disappears > >> > >> Does anyone have a good idea about the possible cause or how we can fix > >> the problem? > >> > >> Downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=2019857 > > > > I'm not familiar with the btusb driver, so someone on the > > linux-bluetooth mailing list would have a better idea about this. > > However, it does look as though btusb keeps the device locked during the > > entire 10-second period while it tries to send over the firmware, and it > > doesn't abort the procedure when it starts getting disconnection errors > > but instead persists until a timeout expires. Keeping the device locked > > would certainly block lsusb. > > > > In general, locking the device during a firmware upload seems like > > the right thing to do -- you don't want extraneous transfers from > > other processes messing up the firmware! So overall, it appears that > > the whole problem would be solved if the firmware transfer were > > aborted as soon as the -ENODEV errors start appearing. > > the problem seems to be that we hitting HCI command timeout. So the > firmware download is done via HCI commands. These commands are send to > the transport driver btusb.c via hdev->send (as btusb_send_frame). > This triggers the usb_submit_urb or queues them via data->deferred > anchor. All this reports back the error properly except that nobody > does anything with it. > > See hci_send_frame() last portion: > > err = hdev->send(hdev, skb); > if (err < 0) { > bt_dev_err(hdev, "sending frame failed (%d)", err); > kfree_skb(skb); > } > > And that is it. We are not checking for ENODEV or any error here. That > means the failure of the HCI command gets only caught via the HCI > command timeout. I don’t know how to do this yet, but you would have > to look there to fail HCI command right away instead of waiting for > the timeout. I have no idea how all the different layers work here. Clearly something has to signal hdev->req_wait_q after setting hdev->req_status to some appropriate value. Can this be done in btusb.c, either when the URB is submitted or when it completes? Or in hci_send_frame? Alan Stern