Re: Fwd: PROBLEM: Permanent kernel panic in USB hub driver - 3.5.0-22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Artemy,

I'm the xHCI driver maintainer, so if you run into issues
with your USB host controller, please email me and cc
linux-usb@xxxxxxxxxxxxxxx.  I'm not subscribed to LKML, so that's the
easiest way to get bugs fixed.  (Thanks for spotting this Martin!)

Comments below.

On Fri, Jan 25, 2013 at 11:03:55PM +0100, Martin Mokrejs wrote:
> [Forwarding from lkml]

> Date:	Fri, 25 Jan 2013 23:54:06 +0200
> From: Artemy Lebedev <artyom.lebedev@xxxxxxxxx>
> To: linux-kernel@xxxxxxxxxxxxxxx
> Subject: PROBLEM: Permanent kernel panic in USB hub driver - 3.5.0-22
> List-ID: <linux-kernel.vger.kernel.org>
> 
> Hi,
> I'm developing USB device with software implementation of USB
> protocol stack (even link level). It is not yet fully completed so
> it behaves wrong from host point of view. However now it is a good
> stress testing tool for the OS which discovered a couple of
> problems. I understand that the use case is quite unusual but still
> the problem is obvious. I have a permanent scenario for making a
> kernel panic.
> 
> So the scenario is quite simple - somewhere in the middle of USB
> control transaction the device stops to answer host requests (just
> silence) or in another scenario the device continuosly responds with
> NAKs to IN tokens. I have attached the file which contains some
> system information as well as description of the signaling on USB
> bus. There is an oscillogram capture in CSV format (volts on D- and
> D+ lines) changed in time (signal.csv.zip). For your convenience, I
> have parsed it by my analyzing script, see its output in
> signal_analysis.txt. Here it is seen that host sends "Get device
> descriptor" request, the device acknowledges the DATA0 packet by ACK
> in frame 3 and responds by NAK in frame 5 for IN token of status
> stage in frame 4. All the following IN tokens are ignored and host
> retries the whole transaction from the beginning while the device
> still keep silence.
> This behavior leads to two bad effects on host. The first one is the
> following message in dmesg:
> 
> Jan 23 20:39:56 AST-mobile kernel: [ 355.978595] usb 3-2: new
> low-speed USB device number 8 using xhci_hcd
> Jan 23 20:40:01 AST-mobile kernel: [ 360.977493] xhci_hcd
> 0000:00:14.0: Timeout while waiting for address device command
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181444] ------------[ cut > here ]------------
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181461] WARNING: at /build/buildd/linux-3.5.0/drivers/usb/host/xhci.c:3646 xhci_address_device+0x2db/0x300()
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181464] Hardware name: 24382LU
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181466] Modules linked in:
> btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos
> jfs xfs reiserfs bnep rfcomm parport_pc ppdev binfmt_misc ext2
> coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel cryptd arc4
> aes_x86_64 snd_hda_codec_realtek microcode joydev uvcvideo
> videobuf2_core videodev thinkpad_acpi videobuf2_vmalloc iwlwifi
> lpc_ich videobuf2_memops psmouse mac80211 ftdi_sio serio_raw
> usbserial btusb snd_hda_intel snd_seq_midi snd_hda_codec bluetooth
> cfg80211 snd_hwdep snd_rawmidi snd_pcm snd_seq_midi_event snd_seq
> snd_seq_device snd_timer snd_page_alloc snd soundcore mei nvram
> tpm_tis mac_hid usbmon lp parport hid_generic usbhid hid i915
> nouveau firewire_ohci ttm drm_kms_helper firewire_core crc_itu_t drm
> i2c_algo_bit mxm_wmi sdhci_pci sdhci e1000e wmi video
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181552] Pid: 48, comm: > khubd Not tainted 3.5.0-22-generic #34-Ubuntu
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181555] Call Trace:
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181567] [<ffffffff81051c1f>] warn_slowpath_common+0x7f/0xc0
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181572] [<ffffffff81051c7a>] warn_slowpath_null+0x1a/0x20
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181578] [<ffffffff814dab8b>] xhci_address_device+0x2db/0x300
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181585] [<ffffffff814ae599>] hub_port_init+0x229/0x9f0
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181594] [<ffffffff8142ba5d>] ? pm_runtime_set_autosuspend_delay+0x5d/0x80
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181600] [<ffffffff8103fa39>] ? default_spin_lock_flags+0x9/0x10
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181606] [<ffffffff814b1e03>] hub_port_connect_change+0x283/0x9c0
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181611] [<ffffffff814b2943>] hub_events+0x403/0x690
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181619] [<ffffffff8168244f>] ? __schedule+0x3cf/0x7c0
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181624] [<ffffffff814b2c05>] hub_thread+0x35/0x1a0
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181632] [<ffffffff810768b0>] ? finish_wait+0x80/0x80
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181637] [<ffffffff814b2bd0>] ? hub_events+0x690/0x690
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181642] [<ffffffff81075f13>] kthread+0x93/0xa0
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181648] [<ffffffff8168d024>] kernel_thread_helper+0x4/0x10
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181653] [<ffffffff81075e80>] ? kthread_freezable_should_stop+0x70/0x70
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181657] [<ffffffff8168d020>] ? gs_change+0x13/0x13
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181660] ---[ end trace f87cad0b99ce3880 ]---
> Jan 23 20:40:01 AST-mobile kernel: [ 361.181665] xhci_hcd 0000:00:14.0: Virt dev invalid for slot_id 0x7!
> Jan 23 20:40:01 AST-mobile kernel: [ 361.385433] usb 3-2: device not accepting address 8, error -22
> Jan 23 20:40:01 AST-mobile kernel: [ 361.385456] xHCI xhci_free_dev called with unaddressed device
> Jan 23 20:40:06 AST-mobile kernel: [ 366.384344] xhci_hcd 0000:00:14.0: Timeout while waiting for a slot
> Jan 23 20:40:06 AST-mobile kernel: [ 366.384375] hub 3-0:1.0: couldn't allocate port 2 usb_device

Can you please retry with the latest 3.7 stable kernel?  The 3.5 kernel
didn't have support for command cancellation, which is why the Set
Address command is hanging there.  The hung command sits on the command
ring, which means any following command doesn't get run.  That's why you
get timeouts when you plug in a new device: the slot allocation command
for that new device never gets executed, because the xHCI host is still
stuck on the previous command.

> I did not pay attention to kernel warnings about timeouts since they
> are expected, but the backtrace indicates that something bad
> occurred in the kernel (LKM crash?). After that whole the bus
> becomes unusable. Even if the normal device is plugged into the
> port, it is not handled and the following messages continuously
> repeating:
> Jan 23 20:40:06 AST-mobile kernel: [ 366.384344] xhci_hcd 0000:00:14.0: Timeout while waiting for a slot
> Jan 23 20:40:06 AST-mobile kernel: [ 366.384375] hub 3-0:1.0:
> couldn't allocate port 2 usb_device
> This ends only after the bus controller is powered off and on (e.g.
> sleep the system and wake up again).
> 
> The second problem is even more interesting, it's a kernel panic.
> Unfortunately I did find how to get the panic messages from the
> kernel persistent ring buffer (if there is such) after reboot. So I
> have taken a photo, available here:
> https://dl.dropbox.com/u/8276110/2013-01-25%2022.49.16.jpg
> And, of course, the sweets - two kernel core dumps (possibly the
> same, but just for a case created them twice):
> https://dl.dropbox.com/u/8276110/vmcore.gz
> https://dl.dropbox.com/u/8276110/vmcore2.gz

I'm getting on a plane in a few minutes, so I'll have to take a look at
them later.  Please let me know if you can reproduce these on 3.7.

> So, that's all I have. See attached file and cores. Also if there
> will be a patch I can test it with my setup (required firmware
> version of my device is specially tagged as "Linux exploit" in my
> repository :).
> 
> P.S. Sorry, probably I have violated all rules of writing to mail
> list and bug reporting but I did not managed to find clear
> instructions how it should be done. Bugzilla seems to be only for
> 2.6 version. Where to put big attachments (like core dumps) is also
> unclear. So this procedure for Linux kernel is toо unobvious for
> ordinary system developer.

Yeah, we kernel developers suck at documentation and keeping our
bugzilla database up to date. :)  As I mentioned, just send me email and
cc the Linux USB mailing list.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux