Re: MUSB interrupt storm on device removal

Bin Liu <b-liu@xxxxxx> · Wed, 23 Jan 2019 09:21:26 -0600

On Wed, Jan 23, 2019 at 03:55:47PM +0100, Johan Hovold wrote:
> On Wed, Jan 23, 2019 at 08:09:47AM -0600, Bin Liu wrote:
> > On Wed, Jan 23, 2019 at 09:55:49AM +0100, Johan Hovold wrote:
> > > On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote:
> 
> > > > That's not what any other host controller returns when a device is
> > > > removed, so either you are going to have to fix all USB drives for this
> > > > issue, or you need to fix the musb driver to not send this error for
> > > > when a device is removed (hint, do the latter...)
> > > 
> > > Right, this needs to be handle at the HCD level.
> > 
> > Any reason usb_serial_generic_read_bulk_callback() doesn't handle
> > -EPROTO in the same way as -EPIPE?
> 
> Since it is supposed to be intermittent unlike, for example, -ENOENT or
> -EPIPE (the latter which the device driver can recover from if it cares
> to implement clearing of halt).

Okay, makes sense.

> 
> > > dwc2 fixed a similar lockup issue due to retried NAKed transaction by
> > > not retrying immediately:
> > > 
> > > 	38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")
> > 
> > Both cases are all about device removal, but this musb case is slightly
> > different from this dwc2 case.
> > 
> > It is all about re-transmitting which causes interrupt storm, but in
> > this dwc2 case, it is the dwc2 driver doing the re-transmitting, so it
> > makes sense to delay it in the dwc2 driver as this referred patch does,
> >
> > but in this musb case, musb driver reports transaction error to the usb
> > serial driver, the usb serial driver issues the re-transmitting not the
> > musb driver, so I don't think the delay should be added in the musb
> > driver.
> 
> I didn't say it was exactly the same.

Yeah, I know. My point was the fix is in the place where re-transmitting
happens, but

> My point was that unless you fix this at the HCD level, you will need to
> add complex recovery handling to every USB driver and completion handler
> (~500 of those). But perhaps that is what it needed.

okay, it probably make sense to handle the case in HCD because the
number of HCD is much less.

> I do see now that of all USB drivers we have two drivers that handles
> -EPROTO by resubmitting after a delay, while a handful explicitly deals
> with -EPROTO by simply stopping to resubmit (some probably bail out on
> all errors, but the majority appear to resubmit on -EPROTO).

Thanks for the info.
I will handle this case in musb driver.

Regards,
-Bin.