On Wed, Jan 23, 2019 at 11:05:40AM -0500, Alan Stern wrote: > On Wed, 23 Jan 2019, Bin Liu wrote: > > > On Wed, Jan 23, 2019 at 03:55:47PM +0100, Johan Hovold wrote: > > > On Wed, Jan 23, 2019 at 08:09:47AM -0600, Bin Liu wrote: > > > > On Wed, Jan 23, 2019 at 09:55:49AM +0100, Johan Hovold wrote: > > > > > On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote: > > > > > > > > > That's not what any other host controller returns when a device is > > > > > > removed, so either you are going to have to fix all USB drives for this > > > > > > issue, or you need to fix the musb driver to not send this error for > > > > > > when a device is removed (hint, do the latter...) > > > > > > > > > > Right, this needs to be handle at the HCD level. > > > > > > > > Any reason usb_serial_generic_read_bulk_callback() doesn't handle > > > > -EPROTO in the same way as -EPIPE? > > > > > > Since it is supposed to be intermittent unlike, for example, -ENOENT or > > > -EPIPE (the latter which the device driver can recover from if it cares > > > to implement clearing of halt). > > Wait a minute. Nothing says any of those errors is supposed to be > intermittent. As long as an error has a systematic cause (as opposed > to random noise, for example), it will recur as often as the cause > does. > > At least when -EPROTO errors are caused by device disconnect, we know > that they will eventually go away when the upstream hub reports the > port disconnect event. But until then, an interrupt storm is certainly > possible. > > > Okay, makes sense. > > > > > > > > > > dwc2 fixed a similar lockup issue due to retried NAKed transaction by > > > > > not retrying immediately: > > > > > > > > > > 38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away") > > > > > > > > Both cases are all about device removal, but this musb case is slightly > > > > different from this dwc2 case. > > > > > > > > It is all about re-transmitting which causes interrupt storm, but in > > > > this dwc2 case, it is the dwc2 driver doing the re-transmitting, so it > > > > makes sense to delay it in the dwc2 driver as this referred patch does, > > > > > > > > but in this musb case, musb driver reports transaction error to the usb > > > > serial driver, the usb serial driver issues the re-transmitting not the > > > > musb driver, so I don't think the delay should be added in the musb > > > > driver. > > > > > > I didn't say it was exactly the same. > > > > Yeah, I know. My point was the fix is in the place where re-transmitting > > happens, but > > > > > My point was that unless you fix this at the HCD level, you will need to > > > add complex recovery handling to every USB driver and completion handler > > > (~500 of those). But perhaps that is what it needed. > > > > okay, it probably make sense to handle the case in HCD because the > > number of HCD is much less. > > One possibility is to giveback URBs with certain errors (such as > -EPROTO) only at a frame boundary, or at 1-ms intervals. This feels > like a very artificial solution, though. My plan is to add an error counter in musb driver endpoint struct, if -EPROTO has happened consequentially for a certain times, for example 3, giveback URBs with -EPIPE instead -EPROTO. This is the simplest solution I can think of, though I hate expending struct unnecessarily, this is one of the cases. > > > > I do see now that of all USB drivers we have two drivers that handles > > > -EPROTO by resubmitting after a delay, while a handful explicitly deals > > > with -EPROTO by simply stopping to resubmit (some probably bail out on > > > all errors, but the majority appear to resubmit on -EPROTO). > > Any driver which immediately retries an URB after getting -EPROTO or > -EILSEQ or -ETIME, and has no mechanism for backing off or limiting the > retries, is buggy. As far as I can see, that's all there is to it. Agreed, but given that majority appear to resubmit on -EPROTO as Johan said, I think better to handle it in HCD. > > Thanks for the info. > > I will handle this case in musb driver. > > Why doesn't the same problem occur with other types of host controller? Not sure, I am on musb for most of the times. Maybe other HCD doesn't giveback URBs with -EPROTO in such error case. musb controller has a register bit telling the controller has tried the transaction 3 times but didn't receive any response, then the musb driver just giveback this URB with -EPROTO. Regards, -Bin.