Re: MUSB interrupt storm on device removal

Måns Rullgård <mans@xxxxxxxxx> · Tue, 05 Mar 2019 11:30:28 +0000

Bin Liu <b-liu@xxxxxx> writes:

> On Wed, Jan 23, 2019 at 03:55:47PM +0100, Johan Hovold wrote:
>> On Wed, Jan 23, 2019 at 08:09:47AM -0600, Bin Liu wrote:
>> > On Wed, Jan 23, 2019 at 09:55:49AM +0100, Johan Hovold wrote:
>> > > On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote:
>> 
>> > > > That's not what any other host controller returns when a device is
>> > > > removed, so either you are going to have to fix all USB drives for this
>> > > > issue, or you need to fix the musb driver to not send this error for
>> > > > when a device is removed (hint, do the latter...)
>> > > 
>> > > Right, this needs to be handle at the HCD level.
>> > 
>> > Any reason usb_serial_generic_read_bulk_callback() doesn't handle
>> > -EPROTO in the same way as -EPIPE?
>> 
>> Since it is supposed to be intermittent unlike, for example, -ENOENT or
>> -EPIPE (the latter which the device driver can recover from if it cares
>> to implement clearing of halt).
>
> Okay, makes sense.
>
>> 
>> > > dwc2 fixed a similar lockup issue due to retried NAKed transaction by
>> > > not retrying immediately:
>> > > 
>> > > 	38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")
>> > 
>> > Both cases are all about device removal, but this musb case is slightly
>> > different from this dwc2 case.
>> > 
>> > It is all about re-transmitting which causes interrupt storm, but in
>> > this dwc2 case, it is the dwc2 driver doing the re-transmitting, so it
>> > makes sense to delay it in the dwc2 driver as this referred patch does,
>> >
>> > but in this musb case, musb driver reports transaction error to the usb
>> > serial driver, the usb serial driver issues the re-transmitting not the
>> > musb driver, so I don't think the delay should be added in the musb
>> > driver.
>> 
>> I didn't say it was exactly the same.
>
> Yeah, I know. My point was the fix is in the place where re-transmitting
> happens, but
>
>> My point was that unless you fix this at the HCD level, you will need to
>> add complex recovery handling to every USB driver and completion handler
>> (~500 of those). But perhaps that is what it needed.
>
> okay, it probably make sense to handle the case in HCD because the
> number of HCD is much less.
>
>> I do see now that of all USB drivers we have two drivers that handles
>> -EPROTO by resubmitting after a delay, while a handful explicitly deals
>> with -EPROTO by simply stopping to resubmit (some probably bail out on
>> all errors, but the majority appear to resubmit on -EPROTO).
>
> Thanks for the info.
> I will handle this case in musb driver.

What's happening to this?  There's no immediate urgency from my side,
but I don't want it to get forgotten either.

-- 
Måns Rullgård