Hi, Douglas Anderson <dianders@xxxxxxxxxxxx> writes: > On rk3288-veyron devices on Chrome OS it was found that plugging in an > Arduino-based USB device could cause the system to lockup, especially > if the CPU Frequency was at one of the slower operating points (like > 100 MHz / 200 MHz). > > Upon tracing, I found that the following was happening: > * The USB device (full speed) was connected to a high speed hub and > then to the rk3288. Thus, we were dealing with split transactions, > which is all handled in software on dwc2. > * Userspace was initiating a BULK IN transfer > * When we sent the SSPLIT (to start the split transaction), we got an > ACK. Good. Then we issued the CSPLIT. > * When we sent the CSPLIT, we got back a NAK. We immediately (from > the interrupt handler) started to retry and sent another SSPLIT. > * The device kept NAKing our CSPLIT, so we kept ping-ponging between > sending a SSPLIT and a CSPLIT, each time sending from the interrupt > handler. > * The handling of the interrupts was (because of the low CPU speed and > the inefficiency of the dwc2 interrupt handler) was actually taking > _longer_ than it took the other side to send the ACK/NAK. Thus we > were _always_ in the USB interrupt routine. > * The fact that USB interrupts were always going off was preventing > other things from happening in the system. This included preventing > the system from being able to transition to a higher CPU frequency. > > As I understand it, there is no requirement to retry super quickly > after a NAK, we just have to retry sometime in the future. Thus one > solution to the above is to just add a delay between getting a NAK and > retrying the transmission. If this delay is sufficiently long to get > out of the interrupt routine then the rest of the system will be able > to make forward progress. Even a 25 us delay would probably be > enough, but we'll be extra conservative and try to delay 1 ms (the > exact amount depends on HZ and the accuracy of the jiffy and how close > the current jiffy is to ticking, but could be as much as 20 ms or as > little as 1 ms). > > Presumably adding a delay like this could impact the USB throughput, > so we only add the delay with repeated NAKs. > > NOTE: Upon further testing of a pl2303 serial adapter, I found that > this fix may help with problems there. Specifically I found that the > pl2303 serial adapters tend to respond with a NAK when they have > nothing to say and thus we end with this same sequence. > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Reviewed-by: Julius Werner <jwerner@xxxxxxxxxxxx> > Tested-by: Stefan Wahren <stefan.wahren@xxxxxxxx> This seems too big for -rc or -stable inclusion. In any case, this doesn't apply to my testing/next branch. Care to rebase and collect acks you received while doing that? -- balbi
Attachment:
signature.asc
Description: PGP signature