Felipe, On Tue, Dec 12, 2017 at 10:30 AM, Douglas Anderson <dianders at chromium.org> wrote: > On rk3288-veyron devices on Chrome OS it was found that plugging in an > Arduino-based USB device could cause the system to lockup, especially > if the CPU Frequency was at one of the slower operating points (like > 100 MHz / 200 MHz). > > Upon tracing, I found that the following was happening: > * The USB device (full speed) was connected to a high speed hub and > then to the rk3288. Thus, we were dealing with split transactions, > which is all handled in software on dwc2. > * Userspace was initiating a BULK IN transfer > * When we sent the SSPLIT (to start the split transaction), we got an > ACK. Good. Then we issued the CSPLIT. > * When we sent the CSPLIT, we got back a NAK. We immediately (from > the interrupt handler) started to retry and sent another SSPLIT. > * The device kept NAKing our CSPLIT, so we kept ping-ponging between > sending a SSPLIT and a CSPLIT, each time sending from the interrupt > handler. > * The handling of the interrupts was (because of the low CPU speed and > the inefficiency of the dwc2 interrupt handler) was actually taking > _longer_ than it took the other side to send the ACK/NAK. Thus we > were _always_ in the USB interrupt routine. > * The fact that USB interrupts were always going off was preventing > other things from happening in the system. This included preventing > the system from being able to transition to a higher CPU frequency. > > As I understand it, there is no requirement to retry super quickly > after a NAK, we just have to retry sometime in the future. Thus one > solution to the above is to just add a delay between getting a NAK and > retrying the transmission. If this delay is sufficiently long to get > out of the interrupt routine then the rest of the system will be able > to make forward progress. Even a 25 us delay would probably be > enough, but we'll be extra conservative and try to delay 1 ms (the > exact amount depends on HZ and the accuracy of the jiffy and how close > the current jiffy is to ticking, but could be as much as 20 ms or as > little as 1 ms). > > Presumably adding a delay like this could impact the USB throughput, > so we only add the delay with repeated NAKs. > > NOTE: Upon further testing of a pl2303 serial adapter, I found that > this fix may help with problems there. Specifically I found that the > pl2303 serial adapters tend to respond with a NAK when they have > nothing to say and thus we end with this same sequence. > > Signed-off-by: Douglas Anderson <dianders at chromium.org> > Reviewed-by: Julius Werner <jwerner at chromium.org> > Tested-by: Stefan Wahren <stefan.wahren at i2se.com> > Acked-by: John Youn <johnyoun at synopsys.com> > --- > > Changes in v4: > - Removed Cc for stable as per Felipe's request in v3 > - Rebased and squashed the two patches since Kees' timer stuff landed > - Add John Youn's Ack. > > Changes in v3: > - Add tested-by for Stefan Wahren > - Sent to Felipe Balbi as candiate to land this. > - Add Cc for stable (it's always been broken so go as far is as easy) > > Changes in v2: > - Address http://crosreview.com/737520 feedback > > drivers/usb/dwc2/core.h | 1 + > drivers/usb/dwc2/hcd.c | 7 ++++ > drivers/usb/dwc2/hcd.h | 9 +++++ > drivers/usb/dwc2/hcd_intr.c | 20 +++++++++++ > drivers/usb/dwc2/hcd_queue.c | 81 +++++++++++++++++++++++++++++++++++++++++--- > 5 files changed, 114 insertions(+), 4 deletions(-) I don't mean to be a pest, but I'm hoping that we can land this somewhere (if nothing else in your /next tree) just so it doesn't miss another release cycle. If you're not so keen on collecting dwc2 host patches these days, I can also see if Greg KH is willing to take this directly into his tree. Please let me know. Thanks for your time! :) -Doug