Re: Userspace enumeration hang while btusb tries to load firmware of removed device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 03, 2021 at 08:31:03PM +0100, Marcel Holtmann wrote:
> Hi Alan,
> 
> >> a user is seeing a hang in fprintd while enumerating devices which
> >> appears to be caused by an interaction of:
> >> 
> >> * system is resuming from S3
> >> * btusb starts loading firmware
> >> * bluetooth device disappears (probably thinkpad_acpi rfkill)
> >> * libusb enumerates USB devices (fprintd in this case)
> >> 
> >> When this happens, the firmware load fails after a timeout of 10s. It
> >> appears that if userspace queries information about the root hub in
> >> question during this time, it will hang until the btusb firmware load
> >> has timed out.
> >> 
> >> Attaching the full kernel log, below an excerpt, you can see:
> >> * At :12 device removal: "usb 5-4: USB disconnect, device number 33"
> >> * libusb enumeration retrieves information about the usb5 root hub,
> >>   and blocks on this
> >> * At :14 there is a tx timeout on hci0
> >> * At :23 the firmware load finally fails
> >> * Then usb_disable_device happens
> >> * libusb/fprintd gets the usb5 HUB information and continues its
> >>   enumeration
> >> 
> >> As I see it, there may be two issues:
> >> 1. userspace should not block due to the firmware load hanging
> >> 2. btusb should give up more quickly when the device disappears
> >> 
> >> Does anyone have a good idea about the possible cause or how we can fix
> >> the problem?
> >> 
> >> Downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=2019857
> > 
> > I'm not familiar with the btusb driver, so someone on the 
> > linux-bluetooth mailing list would have a better idea about this. 
> > However, it does look as though btusb keeps the device locked during the 
> > entire 10-second period while it tries to send over the firmware, and it 
> > doesn't abort the procedure when it starts getting disconnection errors 
> > but instead persists until a timeout expires.  Keeping the device locked 
> > would certainly block lsusb.
> > 
> > In general, locking the device during a firmware upload seems like
> > the right thing to do -- you don't want extraneous transfers from
> > other processes messing up the firmware!  So overall, it appears that
> > the whole problem would be solved if the firmware transfer were
> > aborted as soon as the -ENODEV errors start appearing.
> 
> the problem seems to be that we hitting HCI command timeout. So the 
> firmware download is done via HCI commands. These commands are send to 
> the transport driver btusb.c via hdev->send (as btusb_send_frame). 
> This triggers the usb_submit_urb or queues them via data->deferred 
> anchor. All this reports back the error properly except that nobody 
> does anything with it.
> 
> See hci_send_frame() last portion:
> 
>         err = hdev->send(hdev, skb);
>         if (err < 0) {
>                 bt_dev_err(hdev, "sending frame failed (%d)", err);
>                 kfree_skb(skb);
>         }
> 
> And that is it. We are not checking for ENODEV or any error here. That 
> means the failure of the HCI command gets only caught via the HCI 
> command timeout. I don’t know how to do this yet, but you would have 
> to look there to fail HCI command right away instead of waiting for 
> the timeout.

I have no idea how all the different layers work here.  Clearly 
something has to signal hdev->req_wait_q after setting hdev->req_status 
to some appropriate value.  Can this be done in btusb.c, either when the 
URB is submitted or when it completes?  Or in hci_send_frame?

Alan Stern



[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux