Re: v4.1.4 xhci: 50%~66% chance of "HC died; cleaning up" on boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 24 Aug 2015 12:48:13 +0300, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> Do the usb devices work normally after a successful boot?
> (no suspicious xhci entries in dmesg)

There are indeed some suspicious messages in my current dmesg:

$ dmesg | grep "[xeou]hci"
[    2.115002] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    2.115018] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[    2.115157] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x00009810
[    2.115165] xhci_hcd 0000:00:14.0: cache line size of 64 is not supported
[    2.115477] usb usb1: Manufacturer: Linux 4.1.4+ xhci-hcd
[    2.116734] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    2.116744] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[    2.116826] usb usb2: Manufacturer: Linux 4.1.4+ xhci-hcd
[    2.426642] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[    7.337807] xhci_hcd 0000:00:14.0: Command completion event does not match command
[    7.337808] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
[   12.349069] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
[   17.360340] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
[   17.472401] usb 1-1: new high-speed USB device number 3 using xhci_hcd
[   17.772584] usb 1-4: new high-speed USB device number 4 using xhci_hcd
[   17.856780] usb 2-1: new SuperSpeed USB device number 4 using xhci_hcd

(completion mismatch & timeout)
"HC died" is not present.

> Add xhci boot debugging to the kernel cmdline:
> 
> dyndbg='module xhci_hcd +p'
> 
> or, if as a separate module:
> 
> xhci_hcd.dyndbg=+p

Thanks, added.

In an attempt to get the failed log still stored to disk, I've modified
my fstab to not mount /boot until accessed: on the mass storage device
there is only /boot, the rest is on a sata hdd[1]. This lowered the bug
probability (maybe by reducing USB load early on ?). I also worked on
getting rid of an apparently unrelated boot annoyance (it turned out
uswsusp locking hard during initrd and requiring me to pass "resume="
parameter on each boot).

After about 10 reboots/cold boots, re-enabling /boot "auto", I
triggered the error again, with given switch. AFAIKS there is a single
extra line (still using "quiet" to avoid loosing output, and typing
from screenshot):

[ 7.34] xhci_hcd 0000:00:14.0: Command completion event does not match command
[12.36] usb 1-1: hub failed to enable device, error -62
[24.99] xhci_hcd 0000:00:14.0: Stopped the command ring railed, maybe the host is dead
[24.99] xhci_hcd 0000:00:14.0: Abort command ring failed
[24.99] xhci_hcd 0000:00:14.0: HC died; cleaning up
[25.00] usb usb1-port1: couldn't allocate usb_device
[25.20] usb 2-1: device not accepting address 2, error -62
[25.20] usb usb2-port1: couldn't allocate usb_device
[25.20] pci_pm_runtime_suspend(): hcd_pci_runtime_suspend+0x0/0x60 [usbcore] returns -16

I do have custom udev rules for power management (topmost rules):

$ cat /etc/udev/rules.d/local.rules 
# From https://wiki.archlinux.org/index.php/Power_management
KERNELS=="0000:03:00.0", GOTO="nopowercontrol"
ACTION=="add|change", SUBSYSTEM=="pci", ATTR{power/control}="auto"
LABEL="nopowercontrol"
ACTION=="add|change", SUBSYSTEM=="usb", TEST=="power/control", ATTR{power/control}="auto"
ACTION=="add|change", SUBSYSTEM=="scsi_host", KERNEL=="host*", ATTR{link_power_management_policy}="min_power"
# Built-in USB drive is not rotational, but kernel thinks it is.
ACTION=="add", SUBSYSTEM=="block", KERNEL=="sd?", ENV{ID_MODEL_ID}=="b155", ENV{ID_VENDOR_ID}=="1005", ATTR{queue/rotational}="0"
# Slot-stable symlinks
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="4:0:0:0", SYMLINK+="qnap/hdd1"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="3:0:0:0", SYMLINK+="qnap/hdd2"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="2:0:0:0", SYMLINK+="qnap/hdd3"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="8:0:0:0", SYMLINK+="qnap/hdd4"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="7:0:0:0", SYMLINK+="qnap/hdd5"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="6:0:0:0", SYMLINK+="qnap/hdd6"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="1-4", SYMLINK+="qnap/builtin_usb"
ACTION=="add|change", SUBSYSTEM=="block", KERNEL=="sd?", KERNELS=="2-1.3", SYMLINK+="qnap/front_usb"
# Fan speed
ACTION=="add|change", KERNEL=="f71882fg.656", ATTR{pwm1_auto_point4_pwm}="100",ATTR{pwm1_auto_point4_temp}="35000", ATTR{pwm1_auto_point5_pwm}="30", ATTR{pwm1_auto_channels_temp}="2", ATTR{pwm2_auto_point4_pwm}="100", ATTR{pwm2_auto_point4_temp}="35000", ATTR{pwm2_auto_point5_pwm}="30", ATTR{pwm2_auto_channels_temp}="1", ATTR{pwm3_enable}="1", ATTR{pwm3}="0"

FWIW, I disabled power sving on 0000:03:00.0 which contains the two Gb
ethernet adapters because it caused issues - issues which are about
next in my "to debug" list.

These udev rules are likely applied by the time the errors happen,
because fan speeds (last rule) do change before that.

> If I make some extra xhci debugging patch on top of v4.1.4 can you try it out?
> (maybe dumping the command ring and the command list)

Sure. Another git repository, git format-patch, git send-email, all
fine for me.

[1] The bios somehow fails booting from sata - this is a NAS enclosure
which firmware I'm replacing with general-purpose linux distro, original
firmware does boot on that usb device.
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux