Alan Stern wrote: > On Wed, 1 May 2013, Martin Mokrejs wrote: > >> Sarah Sharp wrote: >>> The "HW died, polling stopped" message is harmless. It happens when the >>> xHCI host goes into a PCI low power state (D3). When the PCI host goes >>> into D3cold, the registers will read as all Fs, and the polling loop >>> will mistakenly believe the hardware has been removed. However, this >>> bug only effects the debug code. It does not effect any other part of >>> the xHCI driver. >> >> I think I do not mind it affects just the XHCI_DEBUG stuff. I just refer >> to "those" places in the source code where something else *could* happen: >> a detection of a silently ejected or dead hardware. >> >> I really did unplug the express card providing second USB3.0 controller >> (11:00). My point was that although pciehp did not propagate the hot eject >> to downstream drivers (xhci_hcd) I believe xhci_hcd could have realized it >> by itself because it does polling time to time and this, albeit debugging >> code, shows where roughly something more clever could happen. Ideally in >> place of the "HC error bitmask = 0x4" (due to un-notified hot removal) or >> at least at the time when "HW died, polling stopped" was printed >> (un-notified hot-reinsert) xhci_hcd could realize a device is gone. > > That's not how drivers work in Linux. They don't unbind all by > themselves; they wait until the bus-level code tells them to unbind. > xhci-hcd is not alone in this respect; all the drivers behave this way. I don't believe that. From my tests only the USB3 express card suffered "the problem" unlike firewire_ohci and sata_sil24 -based cards. Do you remember the thread https://lkml.org/lkml/2012/4/16/566 ... where about 60 sec timeout was needed to have usb working again? I think I saw meanwhile other talking about 30 sec delay but I believe this would all be easier if xhci_hcd did unbind itself from a dead device. I am naively thinking that PCI has no way to detect a card was hot unplugged if e.g. hotplug was completely left out of a kernel .config or when acpiphp/pciehp don't work, for whatever reason. But, xhci_hcd has the unique advantage that it does polling and it know the device is dead. Probably same applies to uhci/ehci. I just don't believe if an upper level realizes a problem why it could not take an action. Other drivers probably don't do polling, by design, so they are in another situation. > >> So what can be done so that the user does not have to run >> >> echo 1 > /sys/bus/pci/devices/0000:11:00.0/remove >> >> manually? Couldn't xhci_hcd detect somehow that the device is dead or ejected? > > It could detect that the device is dead. In fact, it probably detects > that now. But even if it could tell that the device had been ejected, > it would not unbind itself. > > What can be done is to fix the PCIe core code so that it correctly > realizes when an eject takes place. I believe once that will be fixed as I found that pciehp is broken in its action by pcie_aspm=off whereas it works when pcie_aspm=native. That in turn points to bad ASPM L0/L1 handling and seems similar to issues others had with PCIe LnkCtl on iwlwifi. That is somehow related to those OSC_ trickeries in acpi. Finally, seems other hit ASPM issues with Dell Vostro laptops. :( This will all hopefully get fixed. But I want usb fix as well. ;-) Martin -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html