On Mon, 24 Aug 2015 12:48:13 +0300, Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote: > Do the usb devices work normally after a successful boot? > (no suspicious xhci entries in dmesg) There are indeed some suspicious messages in my current dmesg: $ dmesg | grep "[xeou]hci" [ 2.115002] xhci_hcd 0000:00:14.0: xHCI Host Controller [ 2.115018] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1 [ 2.115157] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x00009810 [ 2.115165] xhci_hcd 0000:00:14.0: cache line size of 64 is not supported [ 2.115477] usb usb1: Manufacturer: Linux 4.1.4+ xhci-hcd [ 2.116734] xhci_hcd 0000:00:14.0: xHCI Host Controller [ 2.116744] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2 [ 2.116826] usb usb2: Manufacturer: Linux 4.1.4+ xhci-hcd [ 2.426642] usb 1-1: new high-speed USB device number 2 using xhci_hcd [ 7.337807] xhci_hcd 0000:00:14.0: Command completion event does not match command [ 7.337808] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 12.349069] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 17.360340] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 17.472401] usb 1-1: new high-speed USB device number 3 using xhci_hcd [ 17.772584] usb 1-4: new high-speed USB device number 4 using xhci_hcd [ 17.856780] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd (completion mismatch & timeout) "HC died" is not present. > Add xhci boot debugging to the kernel cmdline: > > dyndbg='module xhci_hcd +p' > > or, if as a separate module: > > xhci_hcd.dyndbg=+p Thanks, added. In an attempt to get the failed log still stored to disk, I've modified my fstab to not mount /boot until accessed: on the mass storage device there is only /boot, the rest is on a sata hdd[1]. This lowered the bug probability (maybe by reducing USB load early on ?). I also worked on getting rid of an apparently unrelated boot annoyance (it turned out uswsusp locking hard during initrd and requiring me to pass "resume=" parameter on each boot). After about 10 reboots/cold boots, re-enabling /boot "auto", I triggered the error again, with given switch. AFAIKS there is a single extra line (still using "quiet" to avoid loosing output, and typing from screenshot): [ 7.34] xhci_hcd 0000:00:14.0: Command completion event does not match command [12.36] usb 1-1: hub failed to enable device, error -62 [24.99] xhci_hcd 0000:00:14.0: Stopped the command ring railed, maybe the host is dead [24.99] xhci_hcd 0000:00:14.0: Abort command ring failed [24.99] xhci_hcd 0000:00:14.0: HC died; cleaning up [25.00] usb usb1-port1: couldn't allocate usb_device [25.20] usb 2-1: device not accepting address 2, error -62 [25.20] usb usb2-port1: couldn't allocate usb_device [25.20] pci_pm_runtime_suspend(): hcd_pci_runtime_suspend+0x0/0x60 [usbcore] returns -16 I do have custom udev rules for power management (topmost rules): $ cat /etc/udev/rules.d/local.rules # From https://wiki.archlinux.org/index.php/Power_management KERNELS=="0000:03:00.0", GOTO="nopowercontrol" ACTION=="add|change", SUBSYSTEM=="pci", ATTR{power/control}="auto" LABEL="nopowercontrol" ACTION=="add|change", SUBSYSTEM=="usb", TEST=="power/control", ATTR{power/control}="auto" ACTION=="add|change", SUBSYSTEM=="scsi_host", KERNEL=="host*", ATTR{link_power_management_policy}="min_power" # Built-in USB drive is not rotational, but kernel thinks it is. ACTION=="add", SUBSYSTEM=="block", KERNEL=="sd?", ENV{ID_MODEL_ID}=="b155", ENV{ID_VENDOR_ID}=="1005", ATTR{queue/rotational}="0" # Slot-stable symlinks ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="4:0:0:0", SYMLINK+="qnap/hdd1" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="3:0:0:0", SYMLINK+="qnap/hdd2" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="2:0:0:0", SYMLINK+="qnap/hdd3" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="8:0:0:0", SYMLINK+="qnap/hdd4" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="7:0:0:0", SYMLINK+="qnap/hdd5" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="6:0:0:0", SYMLINK+="qnap/hdd6" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="1-4", SYMLINK+="qnap/builtin_usb" ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="2-1.3", SYMLINK+="qnap/front_usb" # Fan speed ACTION=="add|change", KERNEL=="f71882fg.656", ATTR{pwm1_auto_point4_pwm}="100",ATTR{pwm1_auto_point4_temp}="35000", ATTR{pwm1_auto_point5_pwm}="30", ATTR{pwm1_auto_channels_temp}="2", ATTR{pwm2_auto_point4_pwm}="100", ATTR{pwm2_auto_point4_temp}="35000", ATTR{pwm2_auto_point5_pwm}="30", ATTR{pwm2_auto_channels_temp}="1", ATTR{pwm3_enable}="1", ATTR{pwm3}="0" FWIW, I disabled power sving on 0000:03:00.0 which contains the two Gb ethernet adapters because it caused issues - issues which are about next in my "to debug" list. These udev rules are likely applied by the time the errors happen, because fan speeds (last rule) do change before that. > If I make some extra xhci debugging patch on top of v4.1.4 can you try it out? > (maybe dumping the command ring and the command list) Sure. Another git repository, git format-patch, git send-email, all fine for me. [1] The bios somehow fails booting from sata - this is a NAS enclosure which firmware I'm replacing with general-purpose linux distro, original firmware does boot on that usb device. -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html