Re: Nexus 4 causes system hang (xhci_hcd)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 24, 2013 at 06:35:14PM +0200, Vincent Thiele wrote:
> My computer completely freezes. Sadly I can't test ssh. Also I use USB
> mouse and keyboard. Google Nexus 4 is an MTP mass storage device. I
> can try to compile a kernel but I'm relatively new to Linux.

Ok, please try to follow those directions, and let me know if you get
stuck.  Compiling a kernel is not too hard, it just takes some time. :)

In the meantime, I've taken a look at the syslog you provided, and I
think that's sufficient to show me what's happening.  I do still need
the dmesg with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned
on, in order to figure out why the host is misbehaving.

Jul 20 20:30:44 XXXXX-PC kernel: [ 4801.868695] hub 3-0:1.0: unable to enumerate USB device on port 1
Jul 20 20:30:44 XXXXX-PC kernel: [ 4802.252111] usb 3-1: new high-speed USB device number 4 using xhci_hcd
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: The canary thread is apparently starving. Taking action.
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: Demoting known real-time threads.
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2260 of process 2228 (n/a).
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2259 of process 2228 (n/a).
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2258 of process 2228 (n/a).
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2228 of process 2228 (n/a).
Jul 20 20:30:57 XXXXX-PC rtkit-daemon[1858]: Demoted 4 threads.
Jul 20 20:31:00 XXXXX-PC kernel: [ 4807.244928] xhci_hcd 0000:03:00.0: Timeout while waiting for address device command
Jul 20 20:31:00 XXXXX-PC kernel: [ 4818.041422] [sched_delayed] sched: RT throttling activated
Jul 20 20:31:00 XXXXX-PC kernel: [ 4818.241177] usb 3-1: Device not responding to set address.
Jul 20 20:31:00 XXXXX-PC kernel: [ 4818.444883] usb 3-1: device not accepting address 4, error -71

The USB device didn't respond to the Set Address request, so that
command timed out.  That means the xHCI driver asks the host to abort
the command on the command ring, and takes the xHCI command to issue the
Set Address off the command ring.

So, your host seems to be working fine here, and the code is working as
expected, even though your USB device is not working properly.

Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: The canary thread is apparently starving. Taking action.
Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: Demoting known real-time threads.
Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2260 of process 2228 (n/a).
Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2259 of process 2228 (n/a).
Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2258 of process 2228 (n/a).
Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2228 of process 2228 (n/a).
Jul 20 20:31:10 XXXXX-PC rtkit-daemon[1858]: Demoted 4 threads.
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: The canary thread is apparently starving. Taking action.
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: Demoting known real-time threads.
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2260 of process 2228 (n/a).
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2259 of process 2228 (n/a).
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2258 of process 2228 (n/a).
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: Successfully demoted thread 2228 of process 2228 (n/a).
Jul 20 20:31:20 XXXXX-PC rtkit-daemon[1858]: Demoted 4 threads.

Not sure what this RT complaints are about, but that doesn't look good.
I'm not sure if it's related to the problem with your host or not.

Jul 20 20:31:22 XXXXX-PC kernel: [ 4823.437728] xhci_hcd 0000:03:00.0: Timeout while waiting for a slot
Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.598729] xhci_hcd 0000:03:00.0: Stopped the command ring failed, maybe the host is dead
Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.598746] xhci_hcd 0000:03:00.0: Abort command ring failed
Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.598888] xhci_hcd 0000:03:00.0: HC died; cleaning up

Next, the xHCI driver asks the host to give it a new slot for the
device.  The host doesn't respond, so the driver attempts to cancel the
command to get a new slot.  This time, the host doesn't respond to the
request to abort the command ring.  So basically, it looks like the host
hardware itself gets really hung.

That means none of your USB keyboards or mice will work.  However, you
should be able to use either a PS2 mouse and keyboard, or perhaps a
bluetooth keyboard.

In any case, we need to figure out whether the host hardware is just
buggy, or if software is doing something wrong that drives it into this
state.  The extra dmesg should help with that.

Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.598974] hub 3-0:1.0: cannot reset port 1 (err = -19)
Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.598977] hub 3-0:1.0: cannot disable port 1 (err = -19)

Now, this next oops is something we need to fix in the driver:

Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.599000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
Jul 20 20:31:22 XXXXX-PC kernel: [ 4839.599002] IP: [<ffffffff815230f3>] xhci_free_dev+0x63/0x160

I think I know what's happening here:

/*
 * At this point, the struct usb_device is about to go away, the device has
 * disconnected, and all traffic has been stopped and the endpoints have been
 * disabled.  Free any HC data structures associated with that device.
 */
void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev)
{
        struct xhci_hcd *xhci = hcd_to_xhci(hcd);
        struct xhci_virt_device *virt_dev;
        unsigned long flags;
        u32 state;
        int i, ret;
        
        ret = xhci_check_args(hcd, udev, NULL, 0, true, __func__);
        /* If the host is halted due to driver unload, we still need to free the
         * device.
         */
        if (ret <= 0 && ret != -ENODEV)
                return;
        
        virt_dev = xhci->devs[udev->slot_id];

The above line assumes the usb_device has a valid Slot ID.  However, we
just saw in the log that the command to get a slot ID timed out.
Therefore, the slot_id is going to be zero, which means the pointer
assigned to virt_dev will be NULL, or possibly a pointer from another
device.

I can create a fix for this, but I'm going to need you to test it.
Please send me dmesg with CONFIG_USB_DEBUGGING and
CONFIG_USB_XHCI_HCD_DEBUGGING enabled.  Once you've done that, I will
send you a patch to apply on top of Linus' kernel, and I'll have you
re-test.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux