On Thursday, May 4, 2017 6:02:53 PM MDT Mathias Nyman wrote: > On 03.05.2017 22:20, Thomas Fjellstrom wrote: > > On Wednesday, May 3, 2017 1:54:39 PM MDT Alan Stern wrote: > >> On Tue, 2 May 2017, Thomas Fjellstrom wrote: > >> > >>> I just had a brief lockup, desktop stopped responding, other usb devices not > >>> on the usb3 controller. Two android devices were in the process of restarting > >>> > >>> It doesn't seem to matter what android devices it is. > >>> > >>> [294503.849350] ------------[ cut here ]------------ > >>> [294503.849362] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x223/0x230 > >>> [294503.849365] NETDEV WATCHDOG: enp4s0 (igb): transmit queue 0 timed out > >>> [294503.849367] Modules linked in: sr_mod cdrom ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter overlay ebtable_filter ebtables ip6table_filter ip6_tables nfsv3 nfs_acl nfs lockd grace iptable_filter bridge stp llc amdgpu mfd_core fuse vfat fat eeepc_wmi asus_wmi rfkill edac_mce_amd edac_core pcspkr sg amdkfd radeon ttm sunrpc k10temp it87 hwmon_vid fam15h_power efivarfs ip_tables ipv6 autofs4 crc32c_intel i2c_piix4 > >>> [294503.849407] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc7 #8 > >>> [294503.849410] Hardware name: To be filled by O.E.M. To be filled by O.E.M./970 PRO GAMING/AURA, BIOS 0901 11/07/2016 > >>> [294503.849413] Call Trace: > >>> [294503.849417] <IRQ> > >>> [294503.849422] dump_stack+0x4d/0x63 > >>> [294503.849426] __warn+0xc6/0xe0 > >>> [294503.849430] warn_slowpath_fmt+0x46/0x50 > >>> [294503.849434] dev_watchdog+0x223/0x230 > >>> [294503.849438] ? qdisc_rcu_free+0x40/0x40 > >>> [294503.849442] call_timer_fn+0x30/0x160 > >>> [294503.849445] ? qdisc_rcu_free+0x40/0x40 > >>> [294503.849448] run_timer_softirq+0x1e1/0x440 > >>> [294503.849453] ? lapic_next_event+0x18/0x20 > >>> [294503.849456] ? sched_clock_cpu+0x11/0xd0 > >>> [294503.849459] __do_softirq+0x101/0x2f0 > >>> [294503.849463] irq_exit+0xb9/0xc0 > >>> [294503.849466] smp_apic_timer_interrupt+0x38/0x50 > >>> [294503.849470] apic_timer_interrupt+0x86/0x90 > >>> [294503.849474] RIP: 0010:acpi_idle_do_entry+0x2c/0x40 > >>> [294503.849476] RSP: 0018:ffffffffb2a03d90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 > >>> [294503.849480] RAX: 0000000000000000 RBX: ffff884d1a966c00 RCX: 0000000000000034 > >>> [294503.849483] RDX: 4ec4ec4ec4ec4ec5 RSI: 0000000000000001 RDI: ffff884d1a966c64 > >>> [294503.849485] RBP: ffffffffb2a03dd0 R08: 00000000000003e3 R09: 0000000000000018 > >>> [294503.849487] R10: 00000000000003c1 R11: 00000000000003d4 R12: ffff884d1a966c64 > >>> [294503.849490] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001 > >>> [294503.849492] </IRQ> > >>> [294503.849497] ? acpi_idle_enter+0xd7/0x290 > >>> [294503.849502] cpuidle_enter_state+0xed/0x2e0 > >>> [294503.849506] cpuidle_enter+0x12/0x20 > >>> [294503.849509] call_cpuidle+0x1e/0x30 > >>> [294503.849512] do_idle+0x179/0x1d0 > >>> [294503.849515] cpu_startup_entry+0x5d/0x60 > >>> [294503.849518] rest_init+0x7f/0x90 > >>> [294503.849522] start_kernel+0x405/0x412 > >>> [294503.849525] x86_64_start_reservations+0x24/0x26 > >>> [294503.849528] x86_64_start_kernel+0x182/0x193 > >>> [294503.849531] start_cpu+0x14/0x14 > >>> [294503.849534] ? start_cpu+0x14/0x14 > >>> [294503.849537] ---[ end trace 12db587e781d6e4f ]--- > >>> [294503.849558] igb 0000:04:00.0 enp4s0: Reset adapter > >>> [294504.576629] xhci_hcd 0000:02:00.0: Stop command ring failed, maybe the host is dead > >>> [294504.576656] xhci_hcd 0000:02:00.0: Abort command ring failed > >>> [294504.576799] xhci_hcd 0000:02:00.0: xHCI host not responding to stop endpoint command. > >>> [294504.576805] xhci_hcd 0000:02:00.0: Assuming host is dying, halting host. > >> > >> At this point you have reached the limit of my knowledge. The best > >> person to help is Mathias Nyman, the xHCI maintainer (CC'ed). > >> > > For some reason stopping the command ring fails, ring is stopped by writing a > bit in a register, hardware is supposed to clear another bit in the same register > when ring is stopped. We poll for the second bit immediately after writing the first. > If second bit is not cleare after 5 seconds we bail out. > > It could be that hardware never clears the bit. > > You said you had two android phones connected, and both were restarting. > It could be a race in the command ring stopping code. > > Can you reproduce this xhci with only one android device connected? I'll try my best. I have had issues with this controller with just devices connected and no restarting, so I can't guarantee if I can reproduce the same exact issue right away. > -Mathias > > -- Thomas Fjellstrom thomas@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html