Re: Bus noise periodically causes ci_hdrc IRQ lockup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 22, 2019 at 10:43:17AM -0500, Chandler Griscom wrote:
> Hello,
> 
> I am encountering an issue where noise on USB devices is causing the
> host ci_hdrc driver to stall.  The system contains an i.MX6 board
> (UDOO) connected to a USB touchscreen, SMSC95xx hub, an FTDI device,
> and a hi-speed camera.
> 
> Occasionally (after hours or days), or in a noisy environment, all the
> devices on the root hub stop working.  They show up in debugfs, lsusb,
> etc, but any attempt to communicate with them or reset through
> /sys/bus/usb times out with error -110 or -71.
> 
> dmesg, ci_hdrc debugfs entries, and lsusb -v are posted here:
> https://gist.github.com/cjgriscom/5238df9fbf7ffc4f558b37b5883f8398
> 
> Performing a bind/unbind on ci_hdrc with the following commands results in a successful reset:
>  # echo "ci_hdrc.0" > /sys/bus/platform/drivers/ci_hdrc/unbind
>  # echo "ci_hdrc.0" > /sys/bus/platform/drivers/ci_hdrc/bind
> 
> The issue seems to strongly correlate with a large error count in the
> IRQ counter in /sys/kernel/debug/usb/ehci/ci_hdrc.0/registers, whereas
> under normal operation the count is very low:
>   irq normal 1031800 err 199069 iaa 17040 (lost 0)
> After the lockup, interrupts appear to stop firing as the count stops incrementing.
> 
> I have not yet found a way to reproduce the error outside of the
> machine where it occurs.  Swapping hardware has not made a difference.
> I have tried artificially inducing bit errors by manipulating the data
> lines of one of the attached USB ports, and while this creates a large
> number of errors, the bus is able to recover once it returns to normal
> operation.  The most reliable way that I have used to reproduce the
> failure locally is to run a welder nearby, and the driver usually
> fails within minutes.

This sentence is the best thing I have read in a bug report in a very
long time, thank you for it. :)

Yes, noisy electrical things can cause bad problems, the ability for
some hardware to properly recover from those issues is not always the
same.

Peter is the maintainer for this driver, he would know best as he has
access to the hardware data sheets for this chip, and can test things
out.  Maybe he even has access to a good arc welder...

Peter, any ideas?

> I have seen the failure occur on the following kernels:
> 3.14
> 4.15.7
> 4.18.20
> 4.20.6
> 5.0-r7
> 
> Similar reports:
> This old bug report at NXP seems to describe the same issue: https://community.nxp.com/thread/355151
> A similar issue seems to have been fixed in the dwc_otg driver: https://github.com/raspberrypi/linux/issues/552

That's interesting, I don't see where that bug was fixed in that issue
report, just that it was "resolved" in a newer update.  Trying to figure
out what the actual commit might be helpful.

thanks,

greg k-h



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux