Re: usb_kill_urb deadlock with multiple hubs in series (2.6.29)

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Sun, 22 Nov 2009 11:49:29 -0500 (EST)

On Sun, 22 Nov 2009, Simon Arlott wrote:

> 21 19:37:36 [58320.379525] temperhum     D 000034e2     0  2212   2210
> would have 3 USB devices open, but two of them are on other controllers
> and the devices are accessed serially.

The fact that they are on other controllers need not matter.  After
all, the usb_kill_urb_queue wait_queue_head is shared among all the
buses.

>  There are another 3 more USB
> devices; I didn't check all of them but at least one was still working
> properly. The other two would not have been actively in use.
> 
> If it happens again I'll get a stack trace of all tasks. Aren't they
> stuck in the following call inside usb_kill_urb()?
>   wait_event(usb_kill_urb_queue, atomic_read(&urb->use_count) == 0);

Yes.  And I was wrong before -- they aren't stuck waiting for the 
spinlock; they are waiting for the condition to be true.

> Before that, it calls usb_hcd_unlink_urb(), but the return value is
> ignored?

Yes.  At this point we just want to make sure the URB gets killed.  We 
don't care if it was already killed by someone else or if it wasn't 
running in the first place.

You'd get into trouble if you called usb_kill_urb() for an URB that was 
already deallocated, but that can't happen in usb_start_wait_urb().

> One possible path is:
>   usb_hcd_unlink_urb()
>     unlink1() // use_count > 0
>       usb_rh_urb_dequeue()
> 	usb_hcd_check_unlink_urb()
> 
> which could return non-zero stopping usb_hcd_unlink_urb() from calling
> usb_hcd_giveback_urb() and then the use_count isn't decremented. It
> looks like the hardware would then be responsible for causing
> usb_hcd_giveback_urb() to run, unless usb_hcd_poll_rh_status() runs.

Not the hardware -- the software.  URBs sent to root hubs are handled
by usbcore and the controller drivers.  But the URBs in question
weren't sent to root hubs; the URB from temperhum was sent to a serial
device managed by the ch341 driver and the URB from khubd was sent to 
an external hub.

The only reason for these calls to get stuck is that either the OHCI 
controller hardware or the ohci-hcd driver got confused and failed to 
complete the URBs properly.

> This code in usb_hcd_check_unlink_urb() does not match its comment:
>    /* Any status except -EINPROGRESS means something already started to
>     * unlink this URB from the hardware.  So there's no more work to do.
>     */
>    if (urb->unlinked)
>       return -EBUSY;
>    urb->unlinked = status;
> "Any status except -EINPROGRESS" but it doesn't check what the value of
> urb->unlinked is, so it could be -EINPROGRESS.

The comment is out of date.  It should say something like:

	If urb->unlinked is nonzero then someone else has already
	started to unlink this URB from the hardware, so there's
	nothing more to do.

> I can try connecting multiple hub devices which won't stay stable again,
> but not on the same hardware (although it'll still be an OHCI HCD).

Go ahead.  Collect a usbmon trace when you do it.  That way we'll be 
able to see when URBs start and complete.  You should also look at the 
contents of the debugging files for the OHCI controller after the hang 
occurs.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html