Re: MUSB interrupt storm on device removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote:
> On Tue, Jan 22, 2019 at 02:52:44PM -0600, Bin Liu wrote:
> > Hi Johan,
> > 
> > On Tue, Jan 22, 2019 at 02:16:30PM -0600, Bin Liu wrote:
> > > On Tue, Jan 22, 2019 at 05:19:39PM +0000, Måns Rullgård wrote:
> > > > Bin Liu <b-liu@xxxxxx> writes:
> > > > 
> > > > > On Mon, Jan 21, 2019 at 09:20:52PM +0000, Måns Rullgård wrote:
> > > > >> Bin Liu <b-liu@xxxxxx> writes:
> > > > >> 
> > > > >> > On Fri, Jan 18, 2019 at 08:15:02PM +0000, Måns Rullgård wrote:
> > > > >> >> Bin Liu <b-liu@xxxxxx> writes:
> > > > >> >> 
> > > > >> >> > On Mon, Dec 17, 2018 at 09:36:17PM +0000, Måns Rullgård wrote:
> > > > >> >> >> Bin Liu <b-liu@xxxxxx> writes:
> > > > >> >> >> 
> > > > >> >> >> > On Mon, Dec 17, 2018 at 07:16:08PM +0000, Måns Rullgård wrote:
> > > > >> >> >> >> Bin Liu <b-liu@xxxxxx> writes:
> > > > >> >> >> >> 
> > > > >> >> >> >> > Hi,
> > > > >> >> >> >> >
> > > > >> >> >> >> > On Mon, Dec 17, 2018 at 03:13:12PM +0000, Måns Rullgård wrote:
> > > > >> >> >> >> >> I have a strange problem with the musb driver in host mode on AM3358
> > > > >> >> >> >> >> (beaglebone) hardware.  If I connect a multi-port serial adapter and
> > > > >> >
> > > > >> > Is this on beaglebone or beagleboneblack?
> > > > >> 
> > > > >> We've seen it with Beaglebone Enhanced [1] and BeagleCore [2] based
> > > > >> devices.  Both are based on the AM3358 and mostly compatible with the
> > > > >> Beaglebone Black.
> > > > >> 
> > > > >> [1] https://elinux.org/SanCloud_BeagleBoneEnhanced
> > > > >> [2] http://beaglecore.com/products/bcm1str.html
> > > > >
> > > > > Okay, now I see you have a hub on the board. I just reproduced the
> > > > > console lockup after added a hub in my setup. I will find some time to
> > > > > debug this.
> > > > >
> > > > > FYI, there was a similar issue in a few years ago, which was that the
> > > > > hub routine never got called during the storm then the device disconnect
> > > > > event never got populated from the hub driver. I am wondering if this is
> > > > > the same issue pops up again...
> > > > 
> > > > Now that you mentioned the hub, I tried connecting the serial adapter to
> > > > the OTG port (in host mode) on the Beaglebone Enhanced (it's not
> > > > normally used in this product).  When connected directly, the disconnect
> > > > indeed works properly.  With a hub in between, it breaks.
> > > 
> > > Please use the following hack as a workaround for now. I will think more
> > > about it.
> > > 
> > > Regards,
> > > -Bin.
> > > 
> > > --------8<----------
> > > diff git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> > > index b59ce9ad14ce..3719f6d9b26c 100644
> > > --- a/drivers/usb/musb/musb_host.c
> > > +++ b/drivers/usb/musb/musb_host.c
> > > @@ -1811,7 +1811,7 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> > >         } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
> > >                 musb_dbg(musb, "end %d RX proto error", epnum);
> > >  
> > > -               status = -EPROTO;
> > > +               status = -EPIPE;
> > >                 musb_writeb(epio, MUSB_RXINTERVAL, 0);
> > >  
> > >                 rx_csr &= ~MUSB_RXCSR_H_ERROR;
> > 
> > When a usb serial device is detached, the musb controller driver returns
> > -EPROTO in urb giveback, then usb_serial_generic_read_bulk_callback()
> > re-submit the urb, this circle causes interrupt storm, and the hub
> > driver workqueue never got a chance to run to populate the device
> > disconnect event, the whole console locks up. Do you think
> > usb_serial_generic_read_bulk_callback() should handle -EPROTO as well?
> 
> Why is musb returning that error?  According to the documentation we
> have about this type of thing
> (Documentation/driver-api/usb/error-codes.rst), that should only happen
> if one of the following just occured:
> 	- bitstuff error
> 	- no response packet received within the prescribed bus
> 	  turn-around time
> 	- unknown USB error

But we also have in that file:

	This is also one of several codes that different kinds of host
	controller use to indicate a transfer has failed because of
	device disconnect.  In the interval before the hub driver starts
	disconnect processing, devices may receive such fault reports
	for every request.

> That's not what any other host controller returns when a device is
> removed, so either you are going to have to fix all USB drives for this
> issue, or you need to fix the musb driver to not send this error for
> when a device is removed (hint, do the latter...)

Right, this needs to be handle at the HCD level.

dwc2 fixed a similar lockup issue due to retried NAKed transaction by
not retrying immediately:

	38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")

Note that there are some later follow-up patches to that one. Might give
some inspiration though.

Johan



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux