Re: linux-3.2: HW died, polling stopped.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 21, 2012 at 11:14:52PM +0100, Martin Mokrejs wrote:
> Dear Sarah and USB readers,
>   I have problems with USB3.0 in 3.2.2, 3.2.5 and 3.2.11 kernels. However, most
> of the "tests" I did in 3.2.11. For testing, I have a USB mouse connected via
> Evolve Express Card (NEC chip uPD720200) giving me two *additional* USB3.0 ports
> to my Dell Vostro 3550 laptop.

Ok, so you have a TI internal xHCI host controller and an external NEC
xHCI host that's attached via Express Card, correct?

Why do you need the Express Card, if you don't mind my asking?  Can't
you use a USB hub on your internally connected USB ports?

In general, I haven't had very good luck with xHCI Express Cards.  They
seem to work fine for a while, and then they seem to get flaky and start
disconnecting all the time.  I think it's because I was carrying it with
me where ever I went, and they're just not designed to take that abuse.
So it's possible you have a flaky Express Card, but also...

> Suddenly, the mouse disappears from the system
> time to time. I turned on some debugging in the kernel and if I managed
> to ask for the dmesg output soon in time, it is related to this:
> 
> xhci_hcd 0000:11:00.0: Poll event ring: 4295920576
> xhci_hcd 0000:11:00.0: op reg status = 0xffffffff
> xhci_hcd 0000:11:00.0: HW died, polling stopped.

...Express Cards are rather easy to bump, especially when you have a
mouse attached to the port.  If you bump the express card, it will
electrically disconnect from the PCI express bus, and the registers will
read as all "f"s (as you can see from the op reg status).  Then the xHCI
host controller driver will signal to the USB core that all the USB
devices under your host disconnected.  If you jiggle the card again, it
may re-connect, and the xHCI driver will reload and re-enumerate the
device.

Maybe try moving the mouse and keyboard to a different port?  Or just
plug in a USB 2.0 hub into your internally-connected USB 3.0 port.

>   I have attached a file again-stopped-xhci-on-3.2.11.txt where you can find this
> at about line 3332. Into the same file I smahed then lspci, .config and lsusb
> after the error occurred.
> 
>   Funny is that lspci once reported:
> 
> 11:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev ff) (prog-if ff)
>         !!! Unknown header type 7f
>         Kernel driver in use: xhci_hcd
> 
>   I then went to unplug the Express Card and re-plug again, so you will see this in
> the logs in that file as well.
> 
>   I am not sure whether I managed to get the issue reproduce without other USB devices
> attached. I think I always had to have plugged into eSATA/USB2.0 my external keyboard
> at least.
>   However, in some tests you will see xhci_hcd 0000:0b:00.0 where is an external USB3.0
> controller with my external hard disk (but this is connected to the internal chipset
> inside the laptop, not via the Express card where the mouse is, for testing). The disk
> controller makes the sleep sleep after some time.

What do you mean by "The disk controller makes the sleep sleep after
some time"?

> I was not using the disk intentionally,
> just kept it plugged in during some of my tests.
> 
>   I haven't seen anything logged in /var/log/messages, only dmesg was giving output
> when the debug in USB and PCI was turned on.
> 
> 
> ***********
>   I have bits of logs of other attempts. Another replication is in stopped-xhci-on-3.2.11.txt
> file. There is again the "xhci_hcd 0000:11:00.0: HW died, polling stopped." message.
> 
> ***********
>   I think merely a pure bootup of the system is logged in new.dmesg2.txt file, just to give
> you an idea what hardware is this about. new.dmesg.txt has about the same value.
> 
> ***********
>   The file xhci-died-3.2.11.txt does not contain the "HW died" message but maybe I just
> did not have enabled the verbose logging yet? Based on the timestamp it was the very first
> file with logs I wrote.
> 
> My apologies for this rather messy email. I just do not know where to start. ;)
> I have prepared the usbmon support in the kernel but maybe the verbose logging
> is already enough? Or is this a PCI Express Hotplug issue?
>
> P.s.: Could it be related to the MMAPPED IO options set in my .config?

Probably not.

> 0b:00.0 USB controller: Texas Instruments Device 8241 (rev 02) (prog-if 30 [XHCI])
> 	Subsystem: Dell Device 04b3
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 16
> 	Region 0: Memory at f7d00000 (64-bit, non-prefetchable) [size=64K]
> 	Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 1024 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
> 			ClockPM+ Surprise- LLActRep- BwNot-
> 		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
> 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis+
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
> 		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
> 		Vector table: BAR=2 offset=00000000
> 		PBA: BAR=2 offset=00001000
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [150 v1] Device Serial Number 08-00-28-00-00-20-00-00
> 	Kernel driver in use: xhci_hcd

So you have an internal TI xHCI host controller as well?  How does that
work for you?

> 11:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev ff) (prog-if ff)
> 	!!! Unknown header type 7f
> 	Kernel driver in use: xhci_hcd
> 
>            CPU0       CPU1       CPU2       CPU3       
>   0:         58          0          0          0   IO-APIC-edge      timer
>   1:          9          0          0          0   IO-APIC-edge      i8042
>   8:         96          0          0          0   IO-APIC-edge      rtc0
>   9:          6          0          0          0   IO-APIC-fasteoi   acpi
>  12:       6149          0          0          0   IO-APIC-edge      i8042
>  16:       7976          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
>  23:        180          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
>  40:          0          0          0          0  DMAR_MSI-edge      dmar0
>  41:          0          0          0          0  DMAR_MSI-edge      dmar1
>  42:          4          0          0          0   PCI-MSI-edge      pciehp
>  43:     222595          0          0          0   PCI-MSI-edge      i915
>  44:      40867          0          0          0   PCI-MSI-edge      ahci
>  45:          0          0          0          0   PCI-MSI-edge      eth0
>  46:      43409          0          0          0   PCI-MSI-edge      xhci_hcd
>  47:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  48:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  49:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  50:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  51:     101576          0          0          0   PCI-MSI-edge      xhci_hcd
>  52:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  53:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  54:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  55:          0          0          0          0   PCI-MSI-edge      xhci_hcd
>  56:         14          0          0          0   PCI-MSI-edge      mei
>  57:        270          0          0          0   PCI-MSI-edge      snd_hda_intel
>  58:     374411          0          0          0   PCI-MSI-edge      iwlwifi
> NMI:          0          0          0          0   Non-maskable interrupts
> LOC:     551563     358343     434805     521638   Local timer interrupts
> SPU:          0          0          0          0   Spurious interrupts
> PMI:          0          0          0          0   Performance monitoring interrupts
> IWI:          0          0          0          0   IRQ work interrupts
> RES:    1209730    1192147    1096504    1242733   Rescheduling interrupts
> CAL:        121        173        166         83   Function call interrupts
> TLB:       2803       4764       4722       6607   TLB shootdowns
> TRM:          0          0          0          0   Thermal event interrupts
> THR:          0          0          0          0   Threshold APIC interrupts
> MCE:          0          0          0          0   Machine check exceptions
> MCP:         34         34         34         34   Machine check polls
> ERR:          0
> MIS:          0

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux