Hi,
Today I have "successfully" reproduced the bug on 3.7.4 kernel. The
symptoms looks to be exactly the same - null pointer dereference in
interrupt context, somewhere in xhci IRQ handler. Unfortunately this
time I have no core dumps - for some reason crash kernel is not started
after the crash, however dmesg shows that the space is preallocated for
it. However, I think the root cause is the same as in previous cores, so
it is still would be worth to analyze them. Here is a new picture of the
panic:
https://dl.dropbox.com/u/8276110/3.7.4%20panic.jpg
$ cat /proc/version
Linux version 3.7.4 (vagran@AST-mobile) (gcc version 4.7.2
(Ubuntu/Linaro 4.7.2-2ubuntu1) ) #1 SMP Sat Jan 26 12:24:36 EET 2013
The warning followed by bus malfunction also still persists:
Jan 26 13:03:42 AST-mobile kernel: [ 231.054897] usb 3-2: new
low-speed USB device number 14 using xhci_hcd
Jan 26 13:03:47 AST-mobile kernel: [ 236.067310] usb 3-2: device
descriptor read/all, error -110
Jan 26 13:03:47 AST-mobile kernel: [ 236.179193] usb 3-2: new
low-speed USB device number 15 using xhci_hcd
Jan 26 13:03:52 AST-mobile kernel: [ 241.175598] xhci_hcd
0000:00:14.0: Timeout while waiting for address device command
Jan 26 13:03:53 AST-mobile kernel: [ 241.379407] ------------[ cut
here ]------------
Jan 26 13:03:53 AST-mobile kernel: [ 241.379426] WARNING: at
drivers/usb/host/xhci.c:3657 xhci_address_device+0x2db/0x300()
Jan 26 13:03:53 AST-mobile kernel: [ 241.379429] Hardware name: 24382LU
Jan 26 13:03:53 AST-mobile kernel: [ 241.379431] Modules linked in:
ftdi_sio usbserial bnep rfcomm parport_pc ppdev binfmt_misc ext2 arc4
iwldvm mac80211 coretemp snd_hda_codec_realtek kvm_intel kvm
snd_hda_intel snd_hda_codec iwlwifi snd_hwdep snd_pcm thinkpad_acpi
ghash_clmulni_intel aesni_intel ablk_helper btusb snd_seq_midi cryptd
snd_rawmidi bluetooth lrw snd_seq_midi_event aes_x86_64 snd_seq xts
cfg80211 gf128mul psmouse snd_timer snd_seq_device microcode serio_raw
joydev snd nvram tpm_tis lpc_ich soundcore mei snd_page_alloc mac_hid
usbmon lp parport nouveau hid_generic i915 sdhci_pci sdhci usbhid hid
ttm firewire_ohci drm_kms_helper firewire_core e1000e drm crc_itu_t
i2c_algo_bit mxm_wmi wmi video
Jan 26 13:03:53 AST-mobile kernel: [ 241.379505] Pid: 57, comm: khubd
Not tainted 3.7.4 #1
Jan 26 13:03:53 AST-mobile kernel: [ 241.379507] Call Trace:
Jan 26 13:03:53 AST-mobile kernel: [ 241.379518] [<ffffffff810575bf>]
warn_slowpath_common+0x7f/0xc0
Jan 26 13:03:53 AST-mobile kernel: [ 241.379523] [<ffffffff8105761a>]
warn_slowpath_null+0x1a/0x20
Jan 26 13:03:53 AST-mobile kernel: [ 241.379528] [<ffffffff814ea5cb>]
xhci_address_device+0x2db/0x300
Jan 26 13:03:53 AST-mobile kernel: [ 241.379535] [<ffffffff814bd3c9>]
hub_port_init+0x229/0x9f0
Jan 26 13:03:53 AST-mobile kernel: [ 241.379543] [<ffffffff8143763d>]
? pm_runtime_set_autosuspend_delay+0x5d/0x80
Jan 26 13:03:53 AST-mobile kernel: [ 241.379551] [<ffffffff81044b49>]
? default_spin_lock_flags+0x9/0x10
Jan 26 13:03:53 AST-mobile kernel: [ 241.379556] [<ffffffff814c051f>]
hub_thread+0x54f/0x1380
Jan 26 13:03:53 AST-mobile kernel: [ 241.379563] [<ffffffff8107c910>]
? finish_wait+0x80/0x80
Jan 26 13:03:53 AST-mobile kernel: [ 241.379568] [<ffffffff814bffd0>]
? usb_remote_wakeup+0x40/0x40
Jan 26 13:03:53 AST-mobile kernel: [ 241.379572] [<ffffffff8107bfc0>]
kthread+0xc0/0xd0
Jan 26 13:03:53 AST-mobile kernel: [ 241.379577] [<ffffffff8107bf00>]
? kthread_create_on_node+0x130/0x130
Jan 26 13:03:53 AST-mobile kernel: [ 241.379584] [<ffffffff816a102c>]
ret_from_fork+0x7c/0xb0
Jan 26 13:03:53 AST-mobile kernel: [ 241.379588] [<ffffffff8107bf00>]
? kthread_create_on_node+0x130/0x130
Jan 26 13:03:53 AST-mobile kernel: [ 241.379591] ---[ end trace
b3ec4f772c11ce62 ]---
Jan 26 13:03:53 AST-mobile kernel: [ 241.379598] xhci_hcd
0000:00:14.0: Virt dev invalid for slot_id 0xe!
Jan 26 13:03:53 AST-mobile kernel: [ 241.583272] usb 3-2: device not
accepting address 15, error -22
Jan 26 13:03:53 AST-mobile kernel: [ 241.583296] xHCI xhci_free_dev
called with unaddressed device
Jan 26 13:03:58 AST-mobile kernel: [ 246.579674] xhci_hcd
0000:00:14.0: Timeout while waiting for a slot
Jan 26 13:03:58 AST-mobile kernel: [ 246.579692] hub 3-0:1.0:
couldn't allocate port 2 usb_device
Jan 26 13:04:18 AST-mobile kernel: [ 266.681173] xhci_hcd
0000:00:14.0: Timeout while waiting for a slot
Jan 26 13:04:18 AST-mobile kernel: [ 266.681191] hub 3-0:1.0:
couldn't allocate port 2 usb_device
Jan 26 13:05:19 AST-mobile kernel: [ 328.100883] xhci_hcd
0000:00:14.0: Timeout while waiting for a slot
Jan 26 13:05:19 AST-mobile kernel: [ 328.100902] hub 3-0:1.0:
couldn't allocate port 2 usb_device
Jan 26 13:05:32 AST-mobile kernel: [ 341.027570] xhci_hcd
0000:00:14.0: Timeout while waiting for a slot
Jan 26 13:05:32 AST-mobile kernel: [ 341.027590] hub 3-0:1.0:
couldn't allocate port 2 usb_device
Also I found another case of the bus malfunction - some allocation
failures are reported. As with the previous problem it continues even
when normal device is plugged in after the bad one:
Jan 26 13:09:38 AST-mobile kernel: [ 569.791564] usb 3-2: new
low-speed USB device number 31 using xhci_hcd
Jan 26 13:09:43 AST-mobile kernel: [ 574.808971] usb 3-2: device
descriptor read/all, error -110
Jan 26 13:09:44 AST-mobile kernel: [ 574.920977] usb 3-2: new
low-speed USB device number 32 using xhci_hcd
Jan 26 13:09:49 AST-mobile kernel: [ 579.935365] usb 3-2: device
descriptor read/8, error -110
Jan 26 13:09:49 AST-mobile kernel: [ 580.056632] usb 3-2: device
descriptor read/8, error -71
Jan 26 13:09:49 AST-mobile kernel: [ 580.159208] xhci_hcd
0000:00:14.0: Bad Slot ID 6
Jan 26 13:09:49 AST-mobile kernel: [ 580.159216] xhci_hcd
0000:00:14.0: Could not allocate xHCI USB device data structures
Jan 26 13:09:49 AST-mobile kernel: [ 580.159229] hub 3-0:1.0:
couldn't allocate port 2 usb_device
Jan 26 13:10:46 AST-mobile kernel: [ 637.621728] xhci_hcd
0000:00:14.0: Bad Slot ID 7
Jan 26 13:10:46 AST-mobile kernel: [ 637.621736] xhci_hcd
0000:00:14.0: Could not allocate xHCI USB device data structures
Jan 26 13:10:46 AST-mobile kernel: [ 637.621748] hub 3-0:1.0:
couldn't allocate port 2 usb_device
Jan 26 13:11:10 AST-mobile kernel: [ 661.544389] xhci_hcd
0000:00:14.0: Bad Slot ID 8
Jan 26 13:11:10 AST-mobile kernel: [ 661.544392] xhci_hcd
0000:00:14.0: Could not allocate xHCI USB device data structures
Jan 26 13:11:10 AST-mobile kernel: [ 661.544400] hub 3-0:1.0:
couldn't allocate port 2 usb_device
This time scenario is a bit different (actually previous crashes used
the same but I have previously described the very first one, when I
noticed the problem) - host retrieves first 8 bytes of device
descriptor, and requests full descriptor after that. The device accepts
the request, returns the first 8 bytes (the device has 8 bytes maximal
packet size) and after that continues to send NAKs for all the next IN
tokens from the host (see
https://dl.dropbox.com/u/8276110/transaction_analysis.txt for dump).
Probably the problem occurs for all uncompleted transactions.
Also I have tried to plug device into ehci port, but was unable to
recreate the problem with it (nor crash, nor any suspicious messages in
dmesg). Possibly (I'm not sure, because sometimes I need many
plug/unplug retries to recreate the crash) the problem is specific to xhci.
Best regards,
Artyom.
On 01/26/2013 02:57 AM, Sarah Sharp wrote:
Can you please retry with the latest 3.7 stable kernel? The 3.5 kernel
didn't have support for command cancellation, which is why the Set
Address command is hanging there. The hung command sits on the command
ring, which means any following command doesn't get run. That's why you
get timeouts when you plug in a new device: the slot allocation command
for that new device never gets executed, because the xHCI host is still
stuck on the previous command.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html