Re: Control message failures kill entire XHCI stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've come across what appears to be another xHCI issue - attempting to
format a disk with gparted is causing a kernel Oops.  This may not be
related to the issue you're currently investigating, but wanted to
pass it on in case it is (if it isn't let me know and I'll either keep
quiet or raise it separately, whatever you prefer).

I can easily reproduce the crash running 3.19rc6 with Mathias
additional error and debugging messages (debugging switched off).  I
wasn't able to reproduce the issue with xhci debugging enabled, i.e.:

echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

Just enabling debugging for xhci-ring.c, i.e.:

echo -n 'file drivers/usb/host/xhci-ring.c =p' >
/sys/kernel/debug/dynamic_debug/control

allowed me to catch the crash with a bit more info.

I've attached the full log at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1412121/+attachment/4306068/+files/syslog-3.19rc6-m0.1-gparted.log

The end of the log is:

Jan 26 22:33:27 alistair-XPS13 kernel: [  908.402917] xhci_hcd
0000:00:14.0: Stalled endpoint
Jan 26 22:33:27 alistair-XPS13 kernel: [  908.402933] xhci_hcd
0000:00:14.0: cmdring ctrl reg before ringing 0x8
Jan 26 22:33:27 alistair-XPS13 kernel: [  908.402937] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 26 22:33:27 alistair-XPS13 kernel: [  908.402943] xhci_hcd
0000:00:14.0: cmdring ctrl reg after ringing 0x8
Jan 26 22:33:27 alistair-XPS13 kernel: [  908.402949] xhci_hcd
0000:00:14.0: Giveback URB ffff88020554dcc0, len = 0, expected = 512,
status = -32
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450041] xhci_hcd
0000:00:14.0: Stalled endpoint
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450049] xhci_hcd
0000:00:14.0: cmdring ctrl reg before ringing 0x8
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450050] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450053] xhci_hcd
0000:00:14.0: cmdring ctrl reg after ringing 0x8
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450055] xhci_hcd
0000:00:14.0: Giveback URB ffff88020554d3c0, len = 0, expected = 512,
status = -32
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450256] xhci_hcd
0000:00:14.0: Stalled endpoint
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450258] xhci_hcd
0000:00:14.0: cmdring ctrl reg before ringing 0x8
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450259] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450261] xhci_hcd
0000:00:14.0: cmdring ctrl reg after ringing 0x8
Jan 26 22:33:28 alistair-XPS13 kernel: [  909.450263] xhci_hcd
0000:00:14.0: Giveback URB ffff880203f6f000, len = 0, expected = 13,
status = -32
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174704] xhci_hcd
0000:00:14.0: Starting stop cmd watchdog timer for slot  6 ep index 2.
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174713] xhci_hcd
0000:00:14.0: cmdring ctrl reg before ringing 0x8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174716] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174721] xhci_hcd
0000:00:14.0: cmdring ctrl reg after ringing 0x8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174728] xhci_hcd
0000:00:14.0: Stopped on Transfer TRB
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174738] xhci_hcd
0000:00:14.0: cmdring ctrl reg before ringing 0x8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174739] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.174743] xhci_hcd
0000:00:14.0: cmdring ctrl reg after ringing 0x8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.229892] xhci_hcd
0000:00:14.0: Port Status Change Event for port 2
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.229901] xhci_hcd
0000:00:14.0: handle_port_status: starting port polling.
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.286659] xhci_hcd
0000:00:14.0: cmdring ctrl reg before ringing 0x8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.286666] xhci_hcd
0000:00:14.0: // Ding dong!
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.286672] xhci_hcd
0000:00:14.0: cmdring ctrl reg after ringing 0x8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.286817] xhci_hcd
0000:00:14.0: Completed reset device command.
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.286871] BUG: unable to
handle kernel NULL pointer dereference at 0000000000000040
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.286955] IP:
[<ffffffff815effed>] xhci_discover_or_reset_device+0x33d/0x520
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.287030] PGD 0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.287054] Oops: 0000 [#1] SMP
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.287090] Modules linked
in: nls_iso8859_1 uas usb_storage ctr ccm pci_stub vboxpci(OE)
vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_CHECKSUM iptable_mangle
xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables arc4 iwlmvm
mac80211 dm_crypt rpcsec_gss_krb5 nfsv4 intel_rapl iosf_mbi
x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iwlwifi uvcvideo
videobuf2_vmalloc videobuf2_memops snd_hda_codec_realtek
snd_hda_codec_hdmi videobuf2_core crct10dif_pclmul nfsd crc32_pclmul
v4l2_common snd_hda_codec_generic ghash_clmulni_intel snd_soc_rt5640
videodev cfg80211 snd_soc_rl6231 snd_soc_core snd_hda_intel
snd_hda_controller snd_hda_codec aesni_intel auth_rpcgss joydev
hid_rmi aes_x86_64 snd_compress snd_hwdep snd_pcm_dmaengine
dell_laptop lrw nfs_acl dell_wmi media sparse_keymap snd_pcm nfs
gf128mul dcdbas glue_helper snd_seq_midi ablk_helper
snd_seq_midi_event cryptd snd_rawmidi serio_raw snd_seq btusb rfcomm
hid_multitouch bnep snd_seq_device lockd snd_timer mei_me bluetooth
mei snd i2c_hid dw_dmac grace shpchp sunrpc dw_dmac_core lpc_ich
soundcore 8250_dw i2c_designware_platform snd_soc_sst_acpi
spi_pxa2xx_platform i2c_designware_core mac_hid intel_smartconnect
intel_rst fscache parport_pc ppdev coretemp lp parport hid_generic
i915 i2c_algo_bit drm_kms_helper usbhid ahci psmouse hid libahci
sdhci_acpi drm wmi video sdhci
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.287884] CPU: 1 PID: 5682
Comm: usb-storage Tainted: G           OE  3.19.0-rc6akg-m0.1+ #4
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.287932] Hardware name:
Dell Inc. XPS13 9333/099TN4, BIOS A06 11/07/2014
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.287971] task:
ffff8802066e62c0 ti: ffff8802067b4000 task.ti: ffff8802067b4000
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288012] RIP:
0010:[<ffffffff815effed>]  [<ffffffff815effed>]
xhci_discover_or_reset_device+0x33d/0x520
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288069] RSP:
0018:ffff8802067b7af8  EFLAGS: 00010246
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288099] RAX:
0000000000000000 RBX: ffff8802137aa000 RCX: 0000000000000000
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288137] RDX:
0000000000000040 RSI: ffff8802066e62c0 RDI: ffff8802055eedc8
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288176] RBP:
ffff8802067b7b48 R08: ffff8802067b4000 R09: 0000000000000006
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288214] R10:
0000000000000003 R11: 0000000000000020 R12: ffff88021558a000
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288253] R13:
0000000000000002 R14: 0000000000000001 R15: ffff8802137aa1bc
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288292] FS:
0000000000000000(0000) GS:ffff88021f240000(0000)
knlGS:0000000000000000
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288336] CS:  0010 DS:
0000 ES: 0000 CR0: 0000000080050033
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288368] CR2:
0000000000000040 CR3: 0000000001c13000 CR4: 00000000001407e0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288407] Stack:
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288419]
ffff8800b3e07800 0000000100000000 ffff880206721140 ffff8802137abbfc
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288463]
0000000000000000 ffff8800ad788c00 0000000000000000 0000000000000000
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288507]
0000000000000002 0000000000000032 ffff8802067b7bb8 ffffffff815a0dd9
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288552] Call Trace:
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288573]
[<ffffffff815a0dd9>] hub_port_reset+0x379/0x5e0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288607]
[<ffffffff815a10b5>] hub_port_init+0x75/0xca0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288642]
[<ffffffff810a0b70>] ? wake_up_state+0x20/0x20
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288675]
[<ffffffff815ab735>] ? usb_hcd_reset_endpoint+0x25/0x70
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288712]
[<ffffffff815aeb45>] ? usb_enable_endpoint+0x85/0x90
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288747]
[<ffffffff815a1dfd>] usb_reset_and_verify_device+0x11d/0x760
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288787]
[<ffffffff815a25bb>] usb_reset_device+0x17b/0x2c0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288825]
[<ffffffffc0ac9132>] usb_stor_port_reset+0x62/0x70 [usb_storage]
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288867]
[<ffffffffc0ac91d4>] usb_stor_invoke_transport+0x94/0x610
[usb_storage]
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288914]
[<ffffffff817b4dd4>] ? wait_for_completion_interruptible+0xc4/0x2d0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.288957]
[<ffffffffc0ac7d7e>] usb_stor_transparent_scsi_command+0xe/0x10
[usb_storage]
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289005]
[<ffffffffc0acada9>] usb_stor_control_thread+0x209/0x3a0 [usb_storage]
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289050]
[<ffffffffc0acaba0>] ? usb_stor_disconnect+0x130/0x130 [usb_storage]
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289094]
[<ffffffff8109337b>] kthread+0xdb/0x100
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289124]
[<ffffffff810932a0>] ? kthread_create_on_node+0x180/0x180
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289163]
[<ffffffff817b877c>] ret_from_fork+0x7c/0xb0
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289195]
[<ffffffff810932a0>] ? kthread_create_on_node+0x180/0x180
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289231] Code: ff ff 49
39 87 6c ff ff ff 74 15 49 8b 04 24 48 c7 c6 c8 65 b2 81 48 8b 38 31
c0 e8 de f1 ef ff 49 8b 87 4c ff ff ff 48 8d 50 40 <48> 39 50 40 74 1c
49 8b 04 24 48 c7 c6 f8 65 b2 81 48 8b 38 31
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289400] RIP
[<ffffffff815effed>] xhci_discover_or_reset_device+0x33d/0x520
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289443]  RSP <ffff8802067b7af8>
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.289463] CR2: 0000000000000040
Jan 26 22:33:59 alistair-XPS13 kernel: [  940.301795] ---[ end trace
6ee84852c606375e ]---

Hope this helps,
Alistair





On Mon, Jan 26, 2015 at 12:58 PM, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> On 26.01.2015 05:37, Devin Heitmueller wrote:
>> Hi Mathias,
>>
>> Here's an interesting development:  as a result of a related thread on
>> linux-media, I came across a patch they are distributing in openelec:
>>
>> https://github.com/OpenELEC/OpenELEC.tv/commit/b636927dec20652ff020e54ed7838a2e9be51e03
>>
>> Now I'm not saying that reverting the commit in question is the
>> "right" thing to do, but I applied this patch and for the first time
>> in 100+ tests it started to work (i.e. I'm not seeing the XHCI hcd
>> tear down all the attached devices).
>>
>> Given what I've seen of the bug I cannot really explain why the
>> scatter gather list sizes would have any bearing on TRBs for USB
>> control messages to be added to the queue.  Perhaps we're hitting the
>> upper bound of the list?  Any further speculation on my part would
>> just make me look clueless...
>>
>> It would be great if you could offer any insight as to why the patch
>> in question could be responsible for the behavior we're seeing.  I
>> would really rather not just blindly check this patch into my local
>> tree and declare "victory" without understanding the underlying issue
>> and whether it's likely to cause other problems.
>>
>
> Hi
>
> Thanks for digging this info out.
>
> I'm starting to think maybe there could be something wrong in the ring expansion,
> or ring memory management in general.
>
> The failed control endpoint stopping was preceeded by a ring expansion, (in all the logs I got)
> If the ring expansion already kills xhci it would explain why we never see the control message on the bus.
>
> The patch you show increases the TRB_PER_SEGMENT size from 64 to 256, this again reduces the need for ring expansion
> as each segment is already four times bigger initially.
>
> Another thing pointing to ring memory management is the additional checks I added printing:
>
> "Cancelled TD not on stopped ring"
> "Cancel URB NOT on current ring"
>
> Hitting these means that the the urb we want to cancel points to a segment that is no longer part of that endpoints ring.
> But we still alter the memory of that segment.
> We cache, free, and re-use segments as endpoints are dropped and added, or rings expanded.
> We might have randomly altered a segment that just got used by another ring after a ring expansion.
>
> Atleast thats a new theory.
> I need to do more hacking
>
> -Mathias
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux