Re: MUSB interrupt storm on device removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 23, 2019 at 09:55:49AM +0100, Johan Hovold wrote:
> On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Jan 22, 2019 at 02:52:44PM -0600, Bin Liu wrote:
> > > Hi Johan,
> > > 
> > > On Tue, Jan 22, 2019 at 02:16:30PM -0600, Bin Liu wrote:
> > > > On Tue, Jan 22, 2019 at 05:19:39PM +0000, Måns Rullgård wrote:
> > > > > Bin Liu <b-liu@xxxxxx> writes:
> > > > > 
> > > > > > On Mon, Jan 21, 2019 at 09:20:52PM +0000, Måns Rullgård wrote:
> > > > > >> Bin Liu <b-liu@xxxxxx> writes:
> > > > > >> 
> > > > > >> > On Fri, Jan 18, 2019 at 08:15:02PM +0000, Måns Rullgård wrote:
> > > > > >> >> Bin Liu <b-liu@xxxxxx> writes:
> > > > > >> >> 
> > > > > >> >> > On Mon, Dec 17, 2018 at 09:36:17PM +0000, Måns Rullgård wrote:
> > > > > >> >> >> Bin Liu <b-liu@xxxxxx> writes:
> > > > > >> >> >> 
> > > > > >> >> >> > On Mon, Dec 17, 2018 at 07:16:08PM +0000, Måns Rullgård wrote:
> > > > > >> >> >> >> Bin Liu <b-liu@xxxxxx> writes:
> > > > > >> >> >> >> 
> > > > > >> >> >> >> > Hi,
> > > > > >> >> >> >> >
> > > > > >> >> >> >> > On Mon, Dec 17, 2018 at 03:13:12PM +0000, Måns Rullgård wrote:
> > > > > >> >> >> >> >> I have a strange problem with the musb driver in host mode on AM3358
> > > > > >> >> >> >> >> (beaglebone) hardware.  If I connect a multi-port serial adapter and
> > > > > >> >
> > > > > >> > Is this on beaglebone or beagleboneblack?
> > > > > >> 
> > > > > >> We've seen it with Beaglebone Enhanced [1] and BeagleCore [2] based
> > > > > >> devices.  Both are based on the AM3358 and mostly compatible with the
> > > > > >> Beaglebone Black.
> > > > > >> 
> > > > > >> [1] https://elinux.org/SanCloud_BeagleBoneEnhanced
> > > > > >> [2] http://beaglecore.com/products/bcm1str.html
> > > > > >
> > > > > > Okay, now I see you have a hub on the board. I just reproduced the
> > > > > > console lockup after added a hub in my setup. I will find some time to
> > > > > > debug this.
> > > > > >
> > > > > > FYI, there was a similar issue in a few years ago, which was that the
> > > > > > hub routine never got called during the storm then the device disconnect
> > > > > > event never got populated from the hub driver. I am wondering if this is
> > > > > > the same issue pops up again...
> > > > > 
> > > > > Now that you mentioned the hub, I tried connecting the serial adapter to
> > > > > the OTG port (in host mode) on the Beaglebone Enhanced (it's not
> > > > > normally used in this product).  When connected directly, the disconnect
> > > > > indeed works properly.  With a hub in between, it breaks.
> > > > 
> > > > Please use the following hack as a workaround for now. I will think more
> > > > about it.
> > > > 
> > > > Regards,
> > > > -Bin.
> > > > 
> > > > --------8<----------
> > > > diff git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> > > > index b59ce9ad14ce..3719f6d9b26c 100644
> > > > --- a/drivers/usb/musb/musb_host.c
> > > > +++ b/drivers/usb/musb/musb_host.c
> > > > @@ -1811,7 +1811,7 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> > > >         } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
> > > >                 musb_dbg(musb, "end %d RX proto error", epnum);
> > > >  
> > > > -               status = -EPROTO;
> > > > +               status = -EPIPE;
> > > >                 musb_writeb(epio, MUSB_RXINTERVAL, 0);
> > > >  
> > > >                 rx_csr &= ~MUSB_RXCSR_H_ERROR;
> > > 
> > > When a usb serial device is detached, the musb controller driver returns
> > > -EPROTO in urb giveback, then usb_serial_generic_read_bulk_callback()
> > > re-submit the urb, this circle causes interrupt storm, and the hub
> > > driver workqueue never got a chance to run to populate the device
> > > disconnect event, the whole console locks up. Do you think
> > > usb_serial_generic_read_bulk_callback() should handle -EPROTO as well?
> > 
> > Why is musb returning that error?  According to the documentation we
> > have about this type of thing
> > (Documentation/driver-api/usb/error-codes.rst), that should only happen
> > if one of the following just occured:
> > 	- bitstuff error
> > 	- no response packet received within the prescribed bus
> > 	  turn-around time
> > 	- unknown USB error
> 
> But we also have in that file:
> 
> 	This is also one of several codes that different kinds of host
> 	controller use to indicate a transfer has failed because of
> 	device disconnect.  In the interval before the hub driver starts
> 	disconnect processing, devices may receive such fault reports
> 	for every request.

Yes, it is such case - when a hub is in the topology the musb controller
itself doesn't report the device disconnect, it is the hub to do so.
>From the musb controller's perspective the usb transaction runs into the
PROTO error: the controller sent IN token 3 times but all didn't receive
the response packet (because the device is already removed), so the
controller generates error interrupt, then musb driver reports the PROTO
error to the usb core and class drivers.

> 
> > That's not what any other host controller returns when a device is
> > removed, so either you are going to have to fix all USB drives for this
> > issue, or you need to fix the musb driver to not send this error for
> > when a device is removed (hint, do the latter...)
> 
> Right, this needs to be handle at the HCD level.

Any reason usb_serial_generic_read_bulk_callback() doesn't handle
-EPROTO in the same way as -EPIPE?
 
> dwc2 fixed a similar lockup issue due to retried NAKed transaction by
> not retrying immediately:
> 
> 	38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")

Both cases are all about device removal, but this musb case is slightly
different from this dwc2 case.

It is all about re-transmitting which causes interrupt storm, but in
this dwc2 case, it is the dwc2 driver doing the re-transmitting, so it
makes sense to delay it in the dwc2 driver as this referred patch does,

but in this musb case, musb driver reports transaction error to the usb
serial driver, the usb serial driver issues the re-transmitting not the
musb driver, so I don't think the delay should be added in the musb
driver.

Regards,
-Bin.




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux