Hi, I just tried 3.9 kernel with pcie_aspm=off and in another attempt with pcie_aspm=native. I realized the message "HW died" happens only in the former case. I believe this is a bug. If I unplug an express card with a NEC-based USB3 host it should be properly terminated, and xhci_hcd should unbind *even* when "HW died" happened. It is not the case now so I have to do: echo 1 > /sys/bus/pci/devices/0000:11:00.0/remove to get rid of the stale 11:00 device from my system (sysfs entries): /proc/iomem f1104000-f1104fff : r8169 f6800000-f6bfffff : 0000:00:02.0 f6c00000-f7cfffff : PCI Bus 0000:11 - f6c00000-f6c01fff : 0000:11:00.0 - f6c00000-f6c01fff : xhci_hcd f7d00000-f7dfffff : PCI Bus 0000:0b f7d00000-f7d0ffff : 0000:0b:00.0 f7d00000-f7d0ffff : xhci_hcd /proc/interrupts: - 45: 1 0 PCI-MSI-edge xhci_hcd - 46: 0 0 PCI-MSI-edge xhci_hcd - 47: 0 0 PCI-MSI-edge xhci_hcd Let's say that when pcie_aspm=off the first hot eject of the express card with the USB3.0 controller does not result in "HW died" but in "HC error bitmask = 0x4", whatever that means. That is because of pciehp being broken under pcie_aspm=off (unlike under pcie_aspm=native) but is not the story for linux-usb. [ 62.960729] xhci_hcd 0000:0b:00.0: Poll event ring: 4294943584 [ 62.960732] xhci_hcd 0000:11:00.0: Poll event ring: 4294943584 [ 62.960757] xhci_hcd 0000:11:00.0: op reg status = 0x0 [ 62.960763] xhci_hcd 0000:11:00.0: ir_set 0 pending = 0x2 [ 62.960764] xhci_hcd 0000:11:00.0: HC error bitmask = 0x4 [ 62.960765] xhci_hcd 0000:11:00.0: Event ring: [ 62.960768] xhci_hcd 0000:11:00.0: @00000000d6020400 d6020000 00000000 01003028 0000c001 [ 62.960769] xhci_hcd 0000:0b:00.0: op reg status = 0x0 [ 62.960771] xhci_hcd 0000:11:00.0: @00000000d6020410 00000000 00000000 00000000 00000000 [ 62.960772] xhci_hcd 0000:11:00.0: @00000000d6020420 00000000 00000000 00000000 00000000 [ 62.960773] xhci_hcd 0000:0b:00.0: ir_set 0 pending = 0x2 [ 62.960775] xhci_hcd 0000:11:00.0: @00000000d6020430 00000000 00000000 00000000 00000000 [ 62.960776] xhci_hcd 0000:0b:00.0: HC error bitmask = 0x0 [ 62.960777] xhci_hcd 0000:11:00.0: @00000000d6020440 00000000 00000000 00000000 00000000 The kernel is still looking for the device, silly, the device is ejected from the express card slot already: +[ 62.961160] xhci_hcd 0000:11:00.0: // xHC command ring deq ptr low bits + flags = @00000008 +[ 62.961161] xhci_hcd 0000:11:00.0: // xHC command ring deq ptr high bits = @00000000 A subsequent hot re-insert of the card is unnoticed by pciehp (due to a bug cause by pcie_aspm=off) and therefore, xhci_hcd is puzzled and spits out: +[ 123.191537] xhci_hcd 0000:0b:00.0: Poll event ring: 4294949600 +[ 123.191547] xhci_hcd 0000:11:00.0: Poll event ring: 4294949600 +[ 123.191557] xhci_hcd 0000:11:00.0: op reg status = 0xffffffff +[ 123.191563] xhci_hcd 0000:0b:00.0: op reg status = 0x0 +[ 123.191570] xhci_hcd 0000:0b:00.0: ir_set 0 pending = 0x2 +[ 123.191574] xhci_hcd 0000:11:00.0: HW died, polling stopped. +[ 123.191580] xhci_hcd 0000:0b:00.0: HC error bitmask = 0x0 At this step xhci_hcd should unbind the dead device so that it's sysfs entries could be removed (bot iomem and interrupts). If that doe not happen or is not done manually a subsequent hot insert has no chance to succeed and will silently proceed but device is left unconfigured and sysfs entries show just crappy cached values. This can be demonstrated when a desperate users inserts a different express card (a mixture of both is shown in lspci entries but only the old data in sysfs entries). Lets cleanup the mess and ensure xhci_hcd releases resources allocated by the dead device. I speculate the "HC error bitmask = 0x4" should result in a "HW died" case as well. Thank you, Martin P.S.: Collected dmesg/lspci/iomem/interrupts data are at: http://195.113.57.32/~mmokrejs/tmp/20130430.tar.bz2 in 3.9/off subdirectory (the pcie_aspm=off case). The working pcie_aspm=native behavior is documented under 3.9/native subdirectory. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html