Re: ehci_hcd: fatal error, HC died with usbaudio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 29 Mar 2010, Nate Case wrote:

> Hello,
> 
> I'm debugging a problem with USB audio CODECs that only occurs when
> using EHCI.  This is a custom board with a NXP EHCI/OHCI controller.
> The two ports are connected to USB hubs, and the downstream ports are
> connected to USB audio CODECs.

What version of the kernel are you using?  Have you tried using the 
most recent release?

> If I enable only the OHCI driver, I can use the CODECs properly (via the
> usbaudio driver and ALSA interface).  If I use the EHCI driver, I get
> the following messages in the kernel log after I attempt to playback
> audio:

The fact that you're doing isochronous I/O to a full-speed device 
through a high-speed hub complicates matters.  This is the one area 
where ehci-hcd still has significant problems.

> ehci_hcd 0000:07:0c.2: fatal error
> ehci_hcd 0000:07:0c.2: force halt; handhake c10ee024 00004000 00004000 -> -110
> ehci_hcd 0000:07:0c.2: HC died; cleaning up
> 
> The only thing I have to do to produce this error is play back audio on
> one of the CODECs two times, e.g.:
> 
> $ dd if=/dev/zero bs=8000 count=24 | aplay -D hw:0,0 -B 10000 -f S16_LE -r 48000
> Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Mono     
> 24+0 records in                                                                 
> 24+0 records out           
> $ dd if=/dev/zero bs=8000 count=24 | aplay -D hw:0,0 -B 10000 -f S16_LE
> -r 48000
> <error happens, HC dies>

Each of these should produce 2 seconds of silence, right?  24*8000 
bytes = 48000*2 samples (with 2 bytes/sample) = 2 seconds (at 48000 
Hz, mono).

> Playback the first time usually works OK, but readily fails the second
> time I run the above command.
> 
> I added a dump_stack() call just after the first EHCI error, and also
> enabled DEBUG in the ehci driver::
> 
> $ dd if=/dev/zero bs=8000 count=24 | aplay -D front:CARD=W681308,DEV=0 -B 10000 -f S16_LE -r 48000
> Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Mono
> 24+0 records in     
> 24+0 records out
> Call Trace:
> [bf83bd40] [80008c60] show_stack+0x48/0x15c (unreliable)
> [bf83bd80] [802c7140] ehci_irq+0x27c/0x3d4
> [bf83be20] [802ab8cc] usb_hcd_irq+0x50/0xa8
> [bf83be40] [80060948] handle_IRQ_event+0xa4/0x1c0
> [bf83be80] [80062d10] handle_fasteoi_irq+0x8c/0x128
> [bf83bea0] [80006aa4] do_IRQ+0xcc/0x104
> [bf83bed0] [800136e4] ret_from_except+0x0/0x14
> --- Exception: 501 at cpu_idle+0xb4/0xe0
>     LR = cpu_idle+0xd0/0xe0
> [bf83bf90] [80009d1c] cpu_idle+0x60/0xe0 (unreliable)
> [bf83bfb0] [803fff84] start_secondary+0x2e8/0x300
> [bf83bff0] [000034e0] 0x34e0
> ehci_hcd 0000:07:0c.1: fatal error
> ehci_hcd 0000:07:0c.1: fatal command 010018 (park)=0 ithresh=1 Periodic period=256 HALT
> ehci_hcd 0000:07:0c.1: fatal status 4018 Periodic FATAL FLR

These messages indicate that the controller failed, but they don't say 
what the reason for the failure was.

> ehci_hcd 0000:07:0c.1: reset command 01001a (park)=0 ithresh=1 Periodic period=256 Reset HALT
> usb 1-1: unlink qh256-0001/bf186180 start 255 [1/0 us]
> ehci_hcd 0000:07:0c.1: reused qh bf186180 schedule
> usb 1-1: link qh256-0001/bf186180 start 255 [1/0 us]
> usb 1-1: unlink qh256-0001/bf186180 start 255 [1/0 us]
> usb 1-2: unlink qh256-0001/bf186200 start 255 [1/0 us]
> ehci_hcd 0000:07:0c.1: force halt; handhake c10f2024 00004000 00004000 -> -110
> ehci_hcd 0000:07:0c.1: reused qh bf186200 schedule
> usb 1-2: link qh256-0001/bf186200 start 255 [1/0 us]
> usb 1-2: unlink qh256-0001/bf186200 start 255 [1/0 us]
> ehci_hcd 0000:07:0c.1: force halt; handhake c10f2024 00004000 00004000 -> -110
> ehci_hcd 0000:07:0c.1: HC died; cleaning up
> 
> I collected a usbmon trace for the error case and here is the tail end
> of it:
> 
> bf0df700 136558571 C Zo:1:004:2 0:1:161:0 1 0:0:96 0
> bf0df700 136558574 S Zo:1:004:2 -115:1:161 1 -18:0:96 96 D
> bf214900 136560572 C Zo:1:004:2 0:1:162:0 2 0:0:96 0:96:96 0
> bf214900 136560577 S Zo:1:004:2 -115:1:162 1 -18:0:96 96 D
> bf0df700 136561571 C Zo:1:004:2 0:1:164:0 1 0:0:96 0
> bf0df700 136561574 S Zo:1:004:2 -115:1:164 1 -18:0:96 96 D
> bf214900 136562571 C Zo:1:004:2 0:1:165:0 1 0:0:96 0
> bf214900 136562584 S Zo:1:004:2 -115:1:165 2 -18:0:96 -18:96:96 192 D
> bf0df700 136563572 C Zo:1:004:2 -104:1:166:0 1 0:0:96 0
> bf214900 136565572 C Zo:1:004:2 0:1:167:0 2 0:0:96 0:96:96 0
> bf0df780 136608830 C Zi:1:004:6 -104:64:211:0 1 0:0:3 0
> bf0df800 136608833 C Zi:1:004:6 -104:64:211:0 1 0:0:3 0
> bf0df900 136672837 C Zi:1:004:6 -104:64:19:0 1 0:0:3 0
> bf0df880 136672839 C Zi:1:004:6 -104:64:19:0 1 0:0:3 0
> bf05d080 136763823 C Ii:1:002:1 -108:2048 0
> bf05d280 136782772 C Ii:1:003:1 -108:2048 0
> bfa15c00 137120913 C Ii:1:001:1 -2:2048 0
> 
> Device 004: The USB audio CODEC that I'm playing audio through
> Device 001: EHCI host controller / root hub
> Device 002: Hub #1
> Device 003: Hub #2
> 
> In the usbmon trace, I basically see a bunch of isochronous output
> events as expected, and then at the end I see a few isochronous input
> events (even though I'm not recording?)

Those events are merely cancellations of input URBs (although the
start_frame and actual_length values don't make sense).  According to 
the interval values, similar input URBs should show up in a full usbmon 
trace every 64 milliseconds.

>  followed by these interrupt
> input callback events.  If I capture the first run where the EHCI
> controller doesn't die, I do not see those interrupt input events at the
> end; instead I see a control output submission/callback event pair.

That's because the controller didn't die, so the interrupt URBs weren't 
cancelled and the audio device was placed back in the default 
altsetting.

> e.g.:
> 
> (first run: this doesn't die)
> bf2136c0 115084046 S Zo:1:004:2 -115:1:190 2 -18:0:96 -18:96:96 192 D
> bf0df780 115085041 C Zo:1:004:2 0:1:191:0 1 0:0:96 0
> bf0df780 115085044 S Zo:1:004:2 -115:1:191 1 -18:0:96 96 D
> bf2136c0 115087041 C Zo:1:004:2 0:1:192:0 2 0:0:96 0:96:96 0
> bf2136c0 115087046 S Zo:1:004:2 -115:1:192 1 -18:0:96 96 D
> bf0df780 115088041 C Zo:1:004:2 0:1:194:0 1 0:0:96 0
> bf0df780 115088044 S Zo:1:004:2 -115:1:194 1 -18:0:96 96 D
> bf2136c0 115089041 C Zo:1:004:2 0:1:195:0 1 0:0:96 0
> bf2136c0 115089054 S Zo:1:004:2 -115:1:195 2 -18:0:96 -18:96:96 192 D
> bf0df780 115090041 C Zo:1:004:2 -104:1:196:0 1 0:0:96 0
> bf2136c0 115092041 C Zo:1:004:2 0:1:197:0 2 0:0:96 0:96:96 0
> bf0df980 115135300 C Zi:1:004:6 -104:64:241:0 1 0:0:3 0
> bf0df900 115135303 C Zi:1:004:6 -104:64:241:0 1 0:0:3 0
> bf0df800 115199307 C Zi:1:004:6 -104:64:49:0 1 0:0:3 0
> bf0df880 115199309 C Zi:1:004:6 -104:64:49:0 1 0:0:3 0
> bf0df980 115200047 S Co:1:004:0 s 01 0b 0000 0002 0000 0
> bf0df980 115200289 C Co:1:004:0 0 0
> 
> Any debugging clues/guidance would be greatly appreciated.

Use the most recent kernel version you can.  Post the complete usbmon 
traces, not just the last few entries.  To reduce the size of the 
output, you could play only 200 ms of silence instead of 2 seconds.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux