Bus noise periodically causes ci_hdrc IRQ lockup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am encountering an issue where noise on USB devices is causing the host ci_hdrc driver to stall.  The system contains an i.MX6 board (UDOO) connected to a USB touchscreen, SMSC95xx hub, an FTDI device, and a hi-speed camera.

Occasionally (after hours or days), or in a noisy environment, all the devices on the root hub stop working.  They show up in debugfs, lsusb, etc, but any attempt to communicate with them or reset through /sys/bus/usb times out with error -110 or -71.

dmesg, ci_hdrc debugfs entries, and lsusb -v are posted here:
https://gist.github.com/cjgriscom/5238df9fbf7ffc4f558b37b5883f8398

Performing a bind/unbind on ci_hdrc with the following commands results in a successful reset:
 # echo "ci_hdrc.0" > /sys/bus/platform/drivers/ci_hdrc/unbind
 # echo "ci_hdrc.0" > /sys/bus/platform/drivers/ci_hdrc/bind

The issue seems to strongly correlate with a large error count in the IRQ counter in /sys/kernel/debug/usb/ehci/ci_hdrc.0/registers, whereas under normal operation the count is very low:
  irq normal 1031800 err 199069 iaa 17040 (lost 0)
After the lockup, interrupts appear to stop firing as the count stops incrementing.

I have not yet found a way to reproduce the error outside of the machine where it occurs.  Swapping hardware has not made a difference.  I have tried artificially inducing bit errors by manipulating the data lines of one of the attached USB ports, and while this creates a large number of errors, the bus is able to recover once it returns to normal operation.  The most reliable way that I have used to reproduce the failure locally is to run a welder nearby, and the driver usually fails within minutes.

I have seen the failure occur on the following kernels:
3.14
4.15.7
4.18.20
4.20.6
5.0-r7

Similar reports:
This old bug report at NXP seems to describe the same issue: https://community.nxp.com/thread/355151
A similar issue seems to have been fixed in the dwc_otg driver: https://github.com/raspberrypi/linux/issues/552

Help and pointers as to how to get better logs and debug info would be useful.  I'm able to recompile the kernel and test as needed on my end.

Thanks,
Chandler Griscom




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux