On Fri, 29 Nov 2013, Kristian Evensen wrote: > Hello, > > I am currently working on an embedded project based on the Atheros > AR9344 SoC. As a prototype device, we are using the TP-Link TL-WDR4300 > router (http://wiki.openwrt.org/toh/tp-link/tl-wdr4300) and latest > OpenWRT trunk. The kernel is 3.10.18. > > We have over the last couple of weeks experienced a USB problem that > we have not been able to solve. The USB hub works fine most of the > time, but when event X happens, USB becomes unusable for extended > periods of time. We have to disable/enable the power on the USB port > (using GPIO) and then wait until a timeout expires/queue is flushed. > > The devices we have been able to trigger event X with is different > 3G/LTE modems. We have not been able to figure out exactly what > triggers the event, but it happens when we move into areas with poor > or no coverage and then move back into coverage. We see the error both > with QMI-modems (qmi_wwan driver), AT-modems (option_serial driver) > and WebUI-modems (cdc_ether driver). When looking in dmesg after this > event has happened, the following messages appear based on the modem > type: > > QMI: > Thu Nov 21 09:44:53 2013 kern.err kernel: [ 490.600000] qmi_wwan > 1-1.1.2:1.4: nonzero urb status received: -71 > Thu Nov 21 09:44:53 2013 kern.err kernel: [ 490.600000] qmi_wwan > 1-1.1.2:1.4: wdm_int_callback - 0 bytes > > Serial: > [62979.280000] option1 ttyUSB7: option_instat_callback: error -71 > > WebUI: > [ 1192.680000] hub 1-1:1.0: cannot reset port 1 (err = -71) > [ 1192.690000] hub 1-1:1.0: Cannot enable port 1. Maybe the USB cable is bad? > > The common denominator seems to be the -71 error code, which is a > generic Protocol Error if I have understood correctly. When I search > for this error code, it seems that most problems have been due to > power. However, this seems not be the issue here. The modems are > connected to an active hub and event X happens with only a single > modem connected, so it seems unlikely that it is power. The most common reason for -71 errors is that the device failed to send a reply or handshake packet back to the host. Generally this is caused by a bug in the device's firmware (it can also be caused by unplugging the USB cable, but obviously that didn't happen here). Ideally, if you knew what caused the device to go into this buggy state, you could avoid the situation. > My question is, has anyone experienced anything similar and know how > to solve this problem, or have any ideas on how to proceed? Since the > error seems to be independent of drivers, I guess it points to this > being hardware related. Would for example reducing QH_XACTERR_MAX be a > possible (temporary) solution, It would not help. Once the device stops replying to the host, it pretty much doesn't matter what you do on the host. The only way to address the problem is to do some sort of error recovery on the device. > or are there any ways to flush this > queue once we see the error? The most critical part for us is that USB > is blocked for such extended periods of time. You could try doing a USB reset of the device. Of course, this is likely to cause the device to lose all its settings, so it may end up being worse than the original problem. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html