Re: Asmedia USB 1343 crashes

Thomas Fjellstrom <thomas@xxxxxxxxxxxxx> · Thu, 04 May 2017 09:17:50 -0600

On Thursday, May 4, 2017 6:02:53 PM MDT Mathias Nyman wrote:
> On 03.05.2017 22:20, Thomas Fjellstrom wrote:
> > On Wednesday, May 3, 2017 1:54:39 PM MDT Alan Stern wrote:
> >> On Tue, 2 May 2017, Thomas Fjellstrom wrote:
> >>
> >>> I just had a brief lockup, desktop stopped responding, other usb devices 
not
> >>> on the usb3 controller. Two android devices were in the process of 
restarting
> >>>
> >>> It doesn't seem to matter what android devices it is.
> >>>
> >>> [294503.849350] ------------[ cut here ]------------
> >>> [294503.849362] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 
dev_watchdog+0x223/0x230
> >>> [294503.849365] NETDEV WATCHDOG: enp4s0 (igb): transmit queue 0 timed 
out
> >>> [294503.849367] Modules linked in: sr_mod cdrom ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter overlay 
ebtable_filter ebtables ip6table_filter ip6_tables nfsv3 nfs_acl nfs lockd grace 
iptable_filter bridge stp llc amdgpu mfd_core fuse vfat fat eeepc_wmi asus_wmi 
rfkill edac_mce_amd edac_core pcspkr sg amdkfd radeon ttm sunrpc k10temp it87 
hwmon_vid fam15h_power efivarfs ip_tables ipv6 autofs4 crc32c_intel i2c_piix4
> >>> [294503.849407] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc7 #8
> >>> [294503.849410] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./970 PRO GAMING/AURA, BIOS 0901 11/07/2016
> >>> [294503.849413] Call Trace:
> >>> [294503.849417]  <IRQ>
> >>> [294503.849422]  dump_stack+0x4d/0x63
> >>> [294503.849426]  __warn+0xc6/0xe0
> >>> [294503.849430]  warn_slowpath_fmt+0x46/0x50
> >>> [294503.849434]  dev_watchdog+0x223/0x230
> >>> [294503.849438]  ? qdisc_rcu_free+0x40/0x40
> >>> [294503.849442]  call_timer_fn+0x30/0x160
> >>> [294503.849445]  ? qdisc_rcu_free+0x40/0x40
> >>> [294503.849448]  run_timer_softirq+0x1e1/0x440
> >>> [294503.849453]  ? lapic_next_event+0x18/0x20
> >>> [294503.849456]  ? sched_clock_cpu+0x11/0xd0
> >>> [294503.849459]  __do_softirq+0x101/0x2f0
> >>> [294503.849463]  irq_exit+0xb9/0xc0
> >>> [294503.849466]  smp_apic_timer_interrupt+0x38/0x50
> >>> [294503.849470]  apic_timer_interrupt+0x86/0x90
> >>> [294503.849474] RIP: 0010:acpi_idle_do_entry+0x2c/0x40
> >>> [294503.849476] RSP: 0018:ffffffffb2a03d90 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
> >>> [294503.849480] RAX: 0000000000000000 RBX: ffff884d1a966c00 RCX: 
0000000000000034
> >>> [294503.849483] RDX: 4ec4ec4ec4ec4ec5 RSI: 0000000000000001 RDI: 
ffff884d1a966c64
> >>> [294503.849485] RBP: ffffffffb2a03dd0 R08: 00000000000003e3 R09: 
0000000000000018
> >>> [294503.849487] R10: 00000000000003c1 R11: 00000000000003d4 R12: 
ffff884d1a966c64
> >>> [294503.849490] R13: 0000000000000001 R14: 0000000000000001 R15: 
0000000000000001
> >>> [294503.849492]  </IRQ>
> >>> [294503.849497]  ? acpi_idle_enter+0xd7/0x290
> >>> [294503.849502]  cpuidle_enter_state+0xed/0x2e0
> >>> [294503.849506]  cpuidle_enter+0x12/0x20
> >>> [294503.849509]  call_cpuidle+0x1e/0x30
> >>> [294503.849512]  do_idle+0x179/0x1d0
> >>> [294503.849515]  cpu_startup_entry+0x5d/0x60
> >>> [294503.849518]  rest_init+0x7f/0x90
> >>> [294503.849522]  start_kernel+0x405/0x412
> >>> [294503.849525]  x86_64_start_reservations+0x24/0x26
> >>> [294503.849528]  x86_64_start_kernel+0x182/0x193
> >>> [294503.849531]  start_cpu+0x14/0x14
> >>> [294503.849534]  ? start_cpu+0x14/0x14
> >>> [294503.849537] ---[ end trace 12db587e781d6e4f ]---
> >>> [294503.849558] igb 0000:04:00.0 enp4s0: Reset adapter
> >>> [294504.576629] xhci_hcd 0000:02:00.0: Stop command ring failed, maybe 
the host is dead
> >>> [294504.576656] xhci_hcd 0000:02:00.0: Abort command ring failed
> >>> [294504.576799] xhci_hcd 0000:02:00.0: xHCI host not responding to stop 
endpoint command.
> >>> [294504.576805] xhci_hcd 0000:02:00.0: Assuming host is dying, halting 
host.
> >>
> >> At this point you have reached the limit of my knowledge.  The best
> >> person to help is Mathias Nyman, the xHCI maintainer (CC'ed).
> >>
> 
> For some reason stopping the command ring fails, ring is stopped by writing 
a
> bit in a register, hardware is supposed to clear another bit in the same 
register
> when ring is stopped.  We poll for the second bit immediately after writing 
the first.
> If second bit is not cleare after 5 seconds we bail out.
> 
> It could be that hardware never clears the bit.
> 
> You said you had two android phones connected, and both were restarting.
> It could be a race in the command ring stopping code.
> 
> Can you reproduce this xhci with only one android device connected?

I'll try my best. I have had issues with this controller with just devices 
connected and no restarting, so I can't guarantee if I can reproduce the same 
exact issue right away.

> -Mathias
> 
> 

-- 
Thomas Fjellstrom
thomas@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html