Hello, I'm debugging some issues on a couple of different USB TV tuners which boil down to the following error: xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. This is followed by the XHCI driver disconnecting *all* USB devices from the controller. I've done a bit of debugging, and the root of the issue appears to be an intermittent control message timing out, and then the call to usb_kill_urb() that occurs inside of usb_control_msg() when the timeout expires is what causes the disconnect. Specifically, it would appear that xhci_urb_dequeue tries to stop the endpoint using xhci_queue_stop_endpoint(), the command gets queued but the IRQ never fires to perform the TRB_STOP_RING completion code. The function xhci_stop_endpoint_command_watchdog() fires after five seconds, which tears down the entire driver. Below is the dmesg output with the xhci_hcd debugging enabled. The dump_stack() call is something I added (i.e. it's not an OOPS) so I could see which code path was making the usb_kill_urb() call that was failing. Note that the caller is using usb_control_msg() with 1000ms timeout, and we can see from the timestamps that the timer expires which is what causes the call to usb_kill_urb(). I would imagine that explicitly killing URBs is a pretty uncommon task for control endpoint messages (compared to ISOC or BULK endpoints where it's done regularly). Is it possible a exception case has been missed? Independent of the usb_kill_urb() killing the entire stack, I still don't really understand yet why the control message failed in the first place. This is a well-exercised code path in the au0828 driver (related to I2C transfers) and I've never seen this when using the EHCI driver. My assumption is that either the HCD is getting sick which is causing both the control message to fail as well as putting it into an inconsistent state such that we never get the TRB_STOP_RING IRQ, or we've got two separate bugs - the control message failing for some "legitimate" reason (i.e. I screwed something up in my au0828 driver), followed by the usb_kill_urb() error simply not handling killing of URBs on a control endpoint (which causes the entire stack to go down). Thoughts/suggestions/recommendations are welcome. Thanks in advance, Devin Jan 18 14:04:05 devin-MacBookPro kernel: [ 9119.647249] au0828: au0828_writereg(0x0100, 0x00) Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645091] xhci_hcd 0000:00:14.0: Cancel URB ffff8802543c36c0, dev 1, ep 0x0, starting at offset 0x25c358940 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645365] djh dequeue pending=0 ep_index=0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645632] CPU: 1 PID: 2782 Comm: tvtime Tainted: P OE 3.18.0-rc4djh+ #33 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.645921] Hardware name: Apple Inc. MacBookPro11,1/Mac-189A3D4F975D5FFC, BIOS MBP111.88Z.0138.B11.1408291433 08/29/2014 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.646236] ffff88025b9b0000 ffff88023ea2f9d8 ffffffff817445c1 0000000000000000 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.646570] ffff8802543c36c0 ffff88023ea2fa58 ffffffffa0080b2e 000000025c358940 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.646909] ffff88023ea2fa48 000000003ea2fa18 ffff88023e8a22a0 0000000000000000 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.647256] Call Trace: Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.647605] [<ffffffff817445c1>] dump_stack+0x46/0x58 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.647981] [<ffffffffa0080b2e>] xhci_urb_dequeue+0x28e/0x420 [xhci_hcd] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.648357] [<ffffffff8157f1fd>] unlink1+0x2d/0x130 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.648743] [<ffffffff810cbd80>] ? internal_add_timer+0xb0/0xb0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.649133] [<ffffffff814aaa67>] ? get_device+0x17/0x30 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.649526] [<ffffffff8158129d>] usb_hcd_unlink_urb+0x5d/0xf0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.649928] [<ffffffff8158281a>] usb_kill_urb+0x3a/0xa0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.650334] [<ffffffff81093890>] ? wake_up_state+0x20/0x20 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.650746] [<ffffffff81583068>] usb_start_wait_urb+0xc8/0x150 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.651166] [<ffffffff811b4065>] ? __kmalloc+0x55/0x190 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.651586] [<ffffffff815831b5>] usb_control_msg+0xc5/0x110 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.652011] [<ffffffffa05362a9>] au0828_writereg+0x79/0xf0 [au0828] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.652447] [<ffffffff810ccfaf>] ? try_to_del_timer_sync+0x4f/0x70 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.652894] [<ffffffffa053b929>] au0828_analog_stream_disable+0x29/0x50 [au0828] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.653354] [<ffffffffa053ba29>] vidioc_streamoff+0xd9/0x1c0 [au0828] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.653823] [<ffffffffa0b5a98a>] v4l_streamoff+0x1a/0x20 [videodev] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.654298] [<ffffffffa0b5d484>] __video_do_ioctl+0x274/0x2f0 [videodev] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.654786] [<ffffffff811c4dcf>] ? commit_charge+0x1f/0x80 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.655278] [<ffffffffa0b5fdbe>] video_usercopy+0x21e/0x580 [videodev] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.655784] [<ffffffffa0b5d210>] ? v4l_dqevent+0x20/0x20 [videodev] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.656295] [<ffffffff8105623c>] ? __do_page_fault+0x20c/0x550 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.656815] [<ffffffffa0b60135>] video_ioctl2+0x15/0x20 [videodev] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.657359] [<ffffffffa0b596bf>] v4l2_ioctl+0x10f/0x150 [videodev] Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.657895] [<ffffffff8111c2dc>] ? acct_account_cputime+0x1c/0x20 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.658440] [<ffffffff811e3040>] do_vfs_ioctl+0x2e0/0x4c0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.658993] [<ffffffff81097374>] ? vtime_account_user+0x54/0x60 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.659551] [<ffffffff81022097>] ? syscall_trace_enter_phase2+0xa7/0x1d0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.660116] [<ffffffff811e32a1>] SyS_ioctl+0x81/0xa0 Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.660690] [<ffffffff8174cc24>] tracesys_phase2+0xd8/0xdd Jan 18 14:04:06 devin-MacBookPro kernel: [ 9120.664207] xhci_hcd 0000:00:14.0: // Ding dong! Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.671045] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.671679] xhci_hcd 0000:00:14.0: Assuming host is dying, halting host. Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.672317] xhci_hcd 0000:00:14.0: // Halt the HC Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.672950] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 1, ep index 0 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.673602] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 1, ep index 2 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.674246] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 1, ep index 3 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.674900] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 2, ep index 0 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.676020] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 2, ep index 2 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.677146] xhci_hcd 0000:00:14.0: get port status, actual port 2 status = 0x1203 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.678271] xhci_hcd 0000:00:14.0: Get port status returned 0x203 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.679400] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 3, ep index 0 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.680567] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 3, ep index 2 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.681736] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 3, ep index 6 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.682908] xhci_hcd 0000:00:14.0: set port reset, actual port 2 status = 0x1311 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.684078] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 3, ep index 8 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.684776] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 0 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.685464] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 2 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.686158] Bluetooth: hci0 urb ffff88025b997b40 failed to resubmit (22) Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.686863] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 3 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.687576] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 4 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.688298] Bluetooth: hci0 urb ffff88025b997240 failed to resubmit (22) Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.689024] Bluetooth: hci0 urb ffff88025b9973c0 failed to resubmit (22) Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.689752] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 5 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.690491] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 6 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.691239] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 7 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.691983] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 6, ep index 8 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.692725] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 7, ep index 0 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.693470] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 7, ep index 2 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.694214] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 7, ep index 4 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.694981] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 7, ep index 6 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.695728] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 7, ep index 8 Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.696477] xhci_hcd 0000:00:14.0: Calling usb_hc_died() Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.697229] xhci_hcd 0000:00:14.0: HC died; cleaning up Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.697985] xhci_hcd 0000:00:14.0: xHCI host controller is dead. Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.698750] xhci_hcd 0000:00:14.0: Command timeout on stopped ring Jan 18 14:04:11 devin-MacBookPro kernel: [ 9125.699521] xhci_hcd 0000:00:14.0: Turn aborted command ffff88025c227350 to no-op -- Devin J. Heitmueller - Kernel Labs http://www.kernellabs.com -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html