Re: Fwd: PROBLEM: Permanent kernel panic in USB hub driver - 3.5.0-22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 26, 2013 at 02:11:30PM +0200, Artemy Lebedev wrote:
> Hi,
> Today I have "successfully" reproduced the bug on 3.7.4 kernel. The
> symptoms looks to be exactly the same - null pointer dereference in
> interrupt context, somewhere in xhci IRQ handler. Unfortunately this
> time I have no core dumps - for some reason crash kernel is not
> started after the crash, however dmesg shows that the space is
> preallocated for it.

It's probably moot at this point, but have you tried using netconsole to
capture the crash instead of the crash kernel functionality?  I wrote up
a netconsole tutorial on kernelnewbies.org, if you'd like to give it a
try the next time it crashes:

http://kernelnewbies.org/KernelDebug

> However, I think the root cause is the same as
> in previous cores, so it is still would be worth to analyze them.
> Here is a new picture of the panic:
> https://dl.dropbox.com/u/8276110/3.7.4%20panic.jpg

Do you have the UAS driver compiled in?  I see some functions that could
only be called after the UAS driver allocates a streams context (i.e.
xhci_stream_id_to_ring).  It doesn't seem to be related to the Set
Address timeout crash that was your previous issue.

I think this could be related to the bug that Gerd reported about the
additional stream ring segments not being added to the streams radix
tree.  The ring expansion code to add more ring segments on demand was
added after the streams code was merged, and I don't think anyone has
tested that particular combination yet.

I'll take a look at the streams ring code and see if I can find the
issue.

Sarah Sharp

> >$ cat /proc/version
> >Linux version 3.7.4 (vagran@AST-mobile) (gcc version 4.7.2
> >(Ubuntu/Linaro 4.7.2-2ubuntu1) ) #1 SMP Sat Jan 26 12:24:36 EET
> >2013
> 
> The warning followed by bus malfunction also still persists:
> >Jan 26 13:03:42 AST-mobile kernel: [ 231.054897] usb 3-2: new
> >low-speed USB device number 14 using xhci_hcd
> >Jan 26 13:03:47 AST-mobile kernel: [  236.067310] usb 3-2: device
> >descriptor read/all, error -110
> >Jan 26 13:03:47 AST-mobile kernel: [  236.179193] usb 3-2: new
> >low-speed USB device number 15 using xhci_hcd
> >Jan 26 13:03:52 AST-mobile kernel: [  241.175598] xhci_hcd
> >0000:00:14.0: Timeout while waiting for address device command
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379407] ------------[
> >cut here ]------------
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379426] WARNING: at
> >drivers/usb/host/xhci.c:3657 xhci_address_device+0x2db/0x300()
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379429] Hardware name: 24382LU
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379431] Modules linked
> >in: ftdi_sio usbserial bnep rfcomm parport_pc ppdev binfmt_misc
> >ext2 arc4 iwldvm mac80211 coretemp snd_hda_codec_realtek kvm_intel
> >kvm snd_hda_intel snd_hda_codec iwlwifi snd_hwdep snd_pcm
> >thinkpad_acpi ghash_clmulni_intel aesni_intel ablk_helper btusb
> >snd_seq_midi cryptd snd_rawmidi bluetooth lrw snd_seq_midi_event
> >aes_x86_64 snd_seq xts cfg80211 gf128mul psmouse snd_timer
> >snd_seq_device microcode serio_raw joydev snd nvram tpm_tis
> >lpc_ich soundcore mei snd_page_alloc mac_hid usbmon lp parport
> >nouveau hid_generic i915 sdhci_pci sdhci usbhid hid ttm
> >firewire_ohci drm_kms_helper firewire_core e1000e drm crc_itu_t
> >i2c_algo_bit mxm_wmi wmi video
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379505] Pid: 57, comm:
> >khubd Not tainted 3.7.4 #1
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379507] Call Trace:
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379518]
> >[<ffffffff810575bf>] warn_slowpath_common+0x7f/0xc0
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379523]
> >[<ffffffff8105761a>] warn_slowpath_null+0x1a/0x20
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379528]
> >[<ffffffff814ea5cb>] xhci_address_device+0x2db/0x300
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379535]
> >[<ffffffff814bd3c9>] hub_port_init+0x229/0x9f0
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379543]
> >[<ffffffff8143763d>] ? pm_runtime_set_autosuspend_delay+0x5d/0x80
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379551]
> >[<ffffffff81044b49>] ? default_spin_lock_flags+0x9/0x10
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379556]
> >[<ffffffff814c051f>] hub_thread+0x54f/0x1380
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379563]
> >[<ffffffff8107c910>] ? finish_wait+0x80/0x80
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379568]
> >[<ffffffff814bffd0>] ? usb_remote_wakeup+0x40/0x40
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379572]
> >[<ffffffff8107bfc0>] kthread+0xc0/0xd0
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379577]
> >[<ffffffff8107bf00>] ? kthread_create_on_node+0x130/0x130
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379584]
> >[<ffffffff816a102c>] ret_from_fork+0x7c/0xb0
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379588]
> >[<ffffffff8107bf00>] ? kthread_create_on_node+0x130/0x130
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379591] ---[ end trace
> >b3ec4f772c11ce62 ]---
> >Jan 26 13:03:53 AST-mobile kernel: [  241.379598] xhci_hcd
> >0000:00:14.0: Virt dev invalid for slot_id 0xe!
> >Jan 26 13:03:53 AST-mobile kernel: [  241.583272] usb 3-2: device
> >not accepting address 15, error -22
> >Jan 26 13:03:53 AST-mobile kernel: [  241.583296] xHCI
> >xhci_free_dev called with unaddressed device
> >Jan 26 13:03:58 AST-mobile kernel: [  246.579674] xhci_hcd
> >0000:00:14.0: Timeout while waiting for a slot
> >Jan 26 13:03:58 AST-mobile kernel: [  246.579692] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> >Jan 26 13:04:18 AST-mobile kernel: [  266.681173] xhci_hcd
> >0000:00:14.0: Timeout while waiting for a slot
> >Jan 26 13:04:18 AST-mobile kernel: [  266.681191] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> >
> >Jan 26 13:05:19 AST-mobile kernel: [  328.100883] xhci_hcd
> >0000:00:14.0: Timeout while waiting for a slot
> >Jan 26 13:05:19 AST-mobile kernel: [  328.100902] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> >Jan 26 13:05:32 AST-mobile kernel: [  341.027570] xhci_hcd
> >0000:00:14.0: Timeout while waiting for a slot
> >Jan 26 13:05:32 AST-mobile kernel: [  341.027590] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> 
> Also I found another case of the bus malfunction - some allocation
> failures are reported. As with the previous problem it continues
> even when normal device is plugged in after the bad one:
> >Jan 26 13:09:38 AST-mobile kernel: [ 569.791564] usb 3-2: new
> >low-speed USB device number 31 using xhci_hcd
> >Jan 26 13:09:43 AST-mobile kernel: [  574.808971] usb 3-2: device
> >descriptor read/all, error -110
> >Jan 26 13:09:44 AST-mobile kernel: [  574.920977] usb 3-2: new
> >low-speed USB device number 32 using xhci_hcd
> >Jan 26 13:09:49 AST-mobile kernel: [  579.935365] usb 3-2: device
> >descriptor read/8, error -110
> >Jan 26 13:09:49 AST-mobile kernel: [  580.056632] usb 3-2: device
> >descriptor read/8, error -71
> >Jan 26 13:09:49 AST-mobile kernel: [  580.159208] xhci_hcd
> >0000:00:14.0: Bad Slot ID 6
> >Jan 26 13:09:49 AST-mobile kernel: [  580.159216] xhci_hcd
> >0000:00:14.0: Could not allocate xHCI USB device data structures
> >Jan 26 13:09:49 AST-mobile kernel: [  580.159229] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> >
> >Jan 26 13:10:46 AST-mobile kernel: [  637.621728] xhci_hcd
> >0000:00:14.0: Bad Slot ID 7
> >Jan 26 13:10:46 AST-mobile kernel: [  637.621736] xhci_hcd
> >0000:00:14.0: Could not allocate xHCI USB device data structures
> >Jan 26 13:10:46 AST-mobile kernel: [  637.621748] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> >
> >Jan 26 13:11:10 AST-mobile kernel: [  661.544389] xhci_hcd
> >0000:00:14.0: Bad Slot ID 8
> >Jan 26 13:11:10 AST-mobile kernel: [  661.544392] xhci_hcd
> >0000:00:14.0: Could not allocate xHCI USB device data structures
> >Jan 26 13:11:10 AST-mobile kernel: [  661.544400] hub 3-0:1.0:
> >couldn't allocate port 2 usb_device
> This time scenario is a bit different (actually previous crashes
> used the same but I have previously described the very first one,
> when I noticed the problem) - host retrieves first 8 bytes of device
> descriptor, and requests full descriptor after that. The device
> accepts the request, returns the first 8 bytes (the device has 8
> bytes maximal packet size) and after that continues to send NAKs for
> all the next IN tokens from the host (see
> https://dl.dropbox.com/u/8276110/transaction_analysis.txt for dump).
> Probably the problem occurs for all uncompleted transactions.
> 
> Also I have tried to plug device into ehci port, but was unable to
> recreate the problem with it (nor crash, nor any suspicious messages
> in dmesg). Possibly (I'm not sure, because sometimes I need many
> plug/unplug retries to recreate the crash) the problem is specific
> to xhci.
> 
> Best regards,
> Artyom.
> 
> On 01/26/2013 02:57 AM, Sarah Sharp wrote:
> >Can you please retry with the latest 3.7 stable kernel?  The 3.5 kernel
> >didn't have support for command cancellation, which is why the Set
> >Address command is hanging there.  The hung command sits on the command
> >ring, which means any following command doesn't get run.  That's why you
> >get timeouts when you plug in a new device: the slot allocation command
> >for that new device never gets executed, because the xHCI host is still
> >stuck on the previous command.
> 

On Thu, Jan 31, 2013 at 10:25:43AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> Started hacking streams support into qemu, trapped into this one:
> 
> [  218.807129] xhci_hcd 0000:00:0f.0: ERROR Transfer event for disabled
> endpoint or incorrect stream ring
> [  218.808087] xhci_hcd 0000:00:0f.0: @000000003c32d560 38342000
> 00000000 01000000 01078001
> 
> Triggers after xhci emulation stepping over the first link trb for a
> stream ring.
> 
> I think it's because xhci doesn't manage the trb_address_map radix tree
> correctly.  I can only find a single radix_tree_insert() call in the
> code, and that one is for the initial segment.  But nobody seems to
> update the radix tree when linking the next segment ...
> 
> cheers,
>   Gerd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux