Hi, On Wed, Apr 27, 2016 at 02:13:56PM -0500, Bin Liu wrote: > Hi, > > On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote: > > 2016-04-27 18:46 GMT+03:00 Bin Liu <b-liu@xxxxxx>: > > > Hi, > > > > > > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote: > > >> Fix soft lockup when resetting remote device attached > > >> to usb host. Configuration: > > >> pppd -> musb hub -> usb-serial -> gsm modem > > > > > > I have heard a few reports similar to this symptom, but never been able > > > to reproduce it on my side. > > > > > > > Ok, I can reproduce it almost very easy. > > > > >> When gsm modem resets, musb rolls in incoming rx interrupts > > >> which does not give any time to other application as result > > >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR > > > > > > Have you looked where exact place in the interrupt routine the execution > > > has stuck in? > > > > > > > It does not stuck. It goes to that line which print proto error over > > and over again and > > nothing stops that. After some time kernel reports lockup. But > > actually it's not stuck, > > all cpu time was eaten by executing that handlers. > > > > > > >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code > > >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS. > > > > > > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures > > > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in > > > musb_regs.h. > > > > > >> > > >> Signed-off-by: Max Uvarov <muvarov@xxxxxxxxx> > > >> --- > > >> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov). > > >> > > >> drivers/usb/musb/musb_host.c | 12 +++++------- > > >> 1 file changed, 5 insertions(+), 7 deletions(-) > > >> > > >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c > > >> index c3d5fc9..2d9aa78 100644 > > >> --- a/drivers/usb/musb/musb_host.c > > >> +++ b/drivers/usb/musb/musb_host.c > > >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum) > > > > > > What kernel do you use? This line # is away off from upstream kernel. > > > > > > > I did this patch for 4.1 but 4.6 has the same problem and patch > > cleanly applies to the latest torvalds/linux.git v4.6-rc5. This > > interrupt handler has the same code. And looks like on 3.14 > > Yeah, this code hasn't been chaned for year. But in general, it is > prepfered to create patches on latest kernel to avoid other headache. > > > everything worked. I don't have a time to diff 2 versions. Might be > > regression. > > > > > > >> > > >> /* stall; record URB status */ > > >> status = -EPIPE; > > >> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) { > > >> > > >> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) { > > >> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum); > > >> - > > >> - status = -EPROTO; > > >> - musb_writeb(epio, MUSB_RXINTERVAL, 0); > > >> - > > >> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) { > > >> + if (rx_csr & MUSB_RXCSR_H_ERROR) { > > >> + status = -EPROTO; > > >> + musb_writeb(epio, MUSB_RXINTERVAL, 0); > > >> + } > > > > > > Please help me to understand how this change fixes the issue. I see the > > > most effect of the change here is directly 'goto finish' so that 'done' > > > flag is not set, then musb_advance_schedule() is not called. Is this the > > > case or I missed other important pieces? > > > > > > > Right that is the goal. On this rxcsr_h_error kernel reschedules > > current interrupt. And that continues forever. For example adding > > The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR > bit, but the current driver doesn't. I am wondering if this causes the > controller keeps generating the same interrupt. Can you please try the > following change instead to see if the lockup goes away? > > @@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum) > status = -EPROTO; > musb_writeb(epio, MUSB_RXINTERVAL, 0); > > + rx_csr &= ~MUSB_RXCSR_H_ERROR; > + musb_writew(epio, MUSB_RXCSR, rx_csr); + goto finish; Please also add the line above. I will spend more time to understand what is happening... First of all, I don't like the idea of merging the two branches, it makes the code ugly. Regards, -Bin. > + > } else if (rx_csr & MUSB_RXCSR_DATAERROR) { > > if (USB_ENDPOINT_XFER_ISOC != qh->type) { > > Regards, > -Bin. > > > msleep() can give some time for other processes. I'm not an expert in > > this chip but I think that right solution in that case is not try to > > reschedule and quick and allow hub to make reset and once again init > > all devices (in my case ppp/pppd also shutdowns and then I bring > > everything up with script.). The same behavior with dma and pio mode. > > > > Regards, > > Max. > > > > > Thanks, > > > -Bin. > > > > > >> > > >> if (USB_ENDPOINT_XFER_ISOC != qh->type) { > > >> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum); > > >> -- > > >> 1.9.1 > > >> > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in > > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > Best regards, > > Maxim Uvarov -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html