Hi Sarah, does anyone has any comments to this thread? I just retried with 3.8.8 kernel and it is still same issue. I can put to 'auto' upstream 1c.4 port, detach mouse and the 1c.4 does not suspend (due to a recent patch I think around 3.8.5). If I set also its downstream 0b:00 to 'auto', plugin mouse ... mouse works, after I unplug the mouse the 0b:00 goes 'suspended' and XHCI socket dies. Here is comparison of the 'active' state and of the 'suspended' to death (note pcie_aspm=off on my kernel command line): --- lspci_vvv_initial.txt 2013-04-20 00:16:11.000000000 +0200 +++ lspci_vvv_initial__mouse_attached__detached__attached__1c.4_to_auto__detached__0b:00_to_auto.txt 2013-04-20 00:18:38.000000000 +0200 @@ -484,15 +484,14 @@ 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) Subsystem: Dell Device 04b3 - Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- - Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f7d00000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- + Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 If I put back 0b:00/control to 'on' I rescue the XHCI socket. So, should the TI host be blacklisted so that it is never put into suspend state? I wrote already that I don't think it is necessary but looks nobody looked into the lspci files. So, here is my interpretation: See another test scenario: 1. When I bootup without any devices attached to the TI host (no laptop-mode-tools), the TI host at 0b:00 is active. 2. If I enable powersaving via setting control file to 'auto' of 1c.4 (just to be sure) and 0b:00, the 0b:00 goes after a while suspended. But it is not dead, if I connect a mouse to the XHCI socket it would work. BUt look how such 'softly suspended' state looks like: # diff -u -w lspci_vvv_initial.txt lspci_vvv_initial__1c.4_and_0b:00_to_auto.txt --- lspci_vvv_initial.txt 2013-04-20 01:06:51.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto.txt 2013-04-20 01:08:46.000000000 +0200 @@ -484,15 +484,14 @@ 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) Subsystem: Dell Device 04b3 - Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- - Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f7d00000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- + Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 # 3. Now, look what happens if I plugin a mouse (works, as I said, and uplug it, which triggers a deadly suspend, although reversible): # diff -u -w lspci_vvv_initial__1c.4_and_0b:00_to_auto.txt lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached.txt --- lspci_vvv_initial__1c.4_and_0b:00_to_auto.txt 2013-04-20 01:08:46.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached.txt 2013-04-20 01:10:06.000000000 +0200 @@ -271,7 +271,7 @@ Changed: MRL- PresDet- LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- - RootSta: PME ReqID 0000, PMEStatus- PMEPending- + RootSta: PME ReqID 0b00, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- 4. Interestingly, if I connect a mouse to the socket to show it is "dead" there is a tiny change in lspci: --- lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached.txt 2013-04-20 01:10:06.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead.txt 2013-04-20 01:10:28.000000000 +0200 @@ -491,7 +491,7 @@ Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- + Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME+ Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 5. I said the port 'suspended to death' can be rescued by echo 'on' > .../*0b:00*/control (the mouse was plugged in during the echo command so we see not only PME changes but also D3 to D0 change because the mouse is attached): # diff -u -w lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead.txt lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__rea ttached_but_dead__0b\:00_to_on_rescues.txt --- lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead.txt 2013-04-20 01:10:28.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b:00_to_on_rescues.txt 2013-04-20 01:12:25.000000000 +0200 @@ -484,14 +484,15 @@ 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) Subsystem: Dell Device 04b3 - Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- + Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f7d00000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME+ + Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 6. When I unplug the mouse of course the port does not die because the control file is set to 'on'. I already demonstrated that but once again, setting 0b:00 to 'auto': # diff -u -w lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b\:00_to_on_rescues__detached.txt lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b\:00_to_on_rescues__detached__0b\:00_to_auto.txt --- lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b:00_to_on_rescues__detached.txt 2013-04-20 01:13:36.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b:00_to_on_rescues__detached__0b:00_to_auto.txt 2013-04-20 01:14:41.000000000 +0200 @@ -484,15 +484,14 @@ 0b:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) (prog-if 30 [XHCI]) Subsystem: Dell Device 04b3 - Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- - Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f7d00000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- + Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 @@ -521,7 +520,7 @@ UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ + CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [150 v1] Device Serial Number 08-00-28-00-00-20-00-00 7. Now, a question to the reader: If I attach the mouse, will it work or not? # diff -u -w lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b\:00_to_on_rescues__detached__0b\:00_to_auto.txt lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b\:00_to_on_rescues__detached__0b\:00_to_auto__attached_dead.txt --- lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b:00_to_on_rescues__detached__0b:00_to_auto.txt 2013-04-20 01:14:41.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b:00_to_on_rescues__detached__0b:00_to_auto__attached_dead.txt 2013-04-20 01:17:59.000000000 +0200 @@ -491,7 +491,7 @@ Region 2: Memory at f7d10000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=100mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- + Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME+ Capabilities: [48] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 # No, it did not work. Situation in step 7 is same like in step 4. The diff below is likely benign: # diff -u -w lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead.txt lspci_vvv_initial__1c.4_and_0b\:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b\:00_to_on_rescues__detached__0b\:00_to_auto__attached_dead.txt --- lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead.txt 2013-04-20 01:10:28.000000000 +0200 +++ lspci_vvv_initial__1c.4_and_0b:00_to_auto__mouse_attached_and_works__detached__reattached_but_dead__0b:00_to_on_rescues__detached__0b:00_to_auto__attached_dead.txt 2013-04-20 01:17:59.000000000 +0200 @@ -520,7 +520,7 @@ UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ + CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [150 v1] Device Serial Number 08-00-28-00-00-20-00-00 # Collected data are at http://195.113.57.32/~mmokrejs/tmp/20130420.tar.bz2 (90kB) Thanks, Martin Martin Mokrejs wrote: > > > Huang Ying wrote: >> Hi, Martin, >> >> On Wed, 2013-04-03 at 14:16 +0200, Martin Mokrejs wrote: >>> Meanwhile, the raw data: http://195.113.57.32/~mmokrejs/tmp/20130402.tar.bz2 >>> (size 468641 bytes) >> >> Thanks a lot! Your information is very complete and clear :) >> >>> They were collected by: >>> >>> # cat ~/bin/collect_runtime_status.sh >>> #!/bin/sh >>> grep . /sys/bus/pci/devices/*/power/runtime_status > runtime_status_"$1".txt >>> grep . /sys/bus/pci/devices/*/power/control > control_"$1".txt >>> cat /proc/interrupts > interrupts_"$1".txt >>> cat /proc/iomem > iomem_"$1".txt >>> lspci -vvv > lspci_vvv_"$1".txt >>> dmesg > dmesg_"$1".txt >>> # >>> >>> Just do 'ls -latr' to see the ordering of the files as they were created. >>> The longer the filename, the later in the test process. The names should be >>> relatively self-explaining. Definitely, from the log files you should see >>> what happened in real and therefore, can figure out what the (maybe weird) >>> long filename really meant. >>> >>> Sometimes I manually recorded lsusb of dmesg_final.txt, mostly after I did some >>> extra tests but but not want to record every step by the above 6 files. >>> >>> In one or two places I added some my own notes into COMMENTS file. >>> >>> >>> >>> >>> I will try to guide your below where you can study which of the bugs. Mostly, >>> for each bug you need just one subdirectory to look into, the other are just >>> repeated the same bug under different kernel version or another patch. >>> However, Sarah for the xHCI dead port issue will need to compare by diff >>> two directories, one with the TI-based controller tests, the other with the >>> NEC-based tests. Especially there, I would do something like: >>> >>> cd *TI-based; for f in dmesg*; do cut -c 15- $f > /tmp/TI/$f; done >>> cd ../*NEC-based; for f in dmesg*; do cut -c 15- $f > /tmp/NEC/$f; done >>> >>> Then it should be easier to poke through file captured at the same test step, >>> like: >>> >>> diff -u -w /tmp/TI/dmesg_initial__mouse_attached__unplugged__reattached_but_port_dead.txt \ >>> /tmp/NEC/dmesg_initial__mouse_attached__detached__reattached.txt >>> >>> >>> >>> Other than that, just diff pairs of files with each other, like: >>> >>> diff -u -w lspci_vvv_initial.txt lspci_vvv_initial__mouse_attached.txt >>> >>> >>> Sorry that I sometimes used only a single underscore instead of double underscores >>> to separate the test steps from each other in the filename. >>> >>> >>> Martin Mokrejs wrote: >>>> [ +linux-pci and Yinghai as they suffered already those many emails on individual >>>> threads so one overviewing email hopefully won't harm] ;-) >>>> >>>> Martin Mokrejs wrote: >>>>> >>>>> >>>>> Bjorn Helgaas wrote: >>>>>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs >>>>>> <mmokrejs@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>> Hi Ying, >>>>>>> >>>>>>> huang ying wrote: >>>>>> >>>>>>>> And please give me the full dmesg for boot and incremental dmesg for >>>>>>>> operations. >>>>>>> >>>>>>> >>>>>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size. >>>>>> >>>>>> Is there a bugzilla for this issue? Please attach the complete dmesg >>>>>> there or somewhere similar so we can all benefit. >>>>> >>>>> I changed my mind. I am attaching the dmesg here but omitting linux-acpi >>>>> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs. >>>>> I thought that the threads I started so far were enough but yes, dmesg >>>>> files don't pass through list filters so I should move that to bugzilla. >>>>> >>>>> so far my view of the the bugs was: >>>>> >>>>> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled >>>>> (eSATA-based card) >>>> >>>> Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug >>>> of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence >>>> the bug 4) below). Now I can continue using laptop-mode-tools. >>> >>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_eSATA_testing >>> 20130402/3.8.3-vanilla__with_laptop-mode-tools (with some comments in >>> COMMENTS file) >> >> Thanks for your testing! >> >>>>> 2) xHCI dead due to to its suspend - 3.8 series and above >>>> >>>> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based >>>> XHCI card *in an express card slot* does not suffer this suspend issue. >>>> Although it is being put into suspend if a device is unplugged. >>> >>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_TI-based >>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_NEC-based >>> >>> Same thing but yet without the port_dbg.patch: >>> 20130402/3.9-rc5__with_2368081__with-latop-mode-tools_xhci_testing/ >> >> It appears that TI xHCI dead port issue will present even if the PCIe >> port will never go suspended. So I think this bug is not related to >> PCIe port runtime PM but related to USB xHCI. >> >> Do you agree Sarah? > > Although I confirmed with 20130405.tar.bz2 dataset what Sarah repeated from our > past findings in the email which should be just in your your inbox, one thing is > puzzling: > When I have powersaving enabled upon bootup with NO USB devices attached to the TI > controller, effectively while reaching multiuser mode the 0b:00.0 is in a suspend > state. But, somehow, the very first mouse plugin works. Only the reject causes > more 'aggressive' suspend. > As it seems no upstream 1c.4 is messing up here (in the test Sarah wanted me to do > we have all control files 'on' except the end 0b:00.0) then really still something > *else* is causing the dead port *in conjunction* with 'suspended' runtime state. > Please double check what I wrote initially about the 20130402.tar.bz2 dataset. > Notably, I would compare lspci outputs from a cold boot state with no devices > attached and suspended 0b:00.0 (the 20130402.tar.bz2 dataset) with the dead port > status in lspci (find any in 20130402.tar.bz2 or now in 20130405.tar.bz2). > > Martin > >> >> [snip] >> >> Best Regards, >> Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html