Hi, all We found that this is a known issue of synopsys DWC3 USB controller, when the PARKMODE_SS of DWC3 is enable, the controller may hang or do wrong TRB schedule in some heavy load conditions. Setting DISABLE_PARKMODE_SS to 1 can work around this bug. Thank you for your help. alex zheng <tc0721@xxxxxxxxx> 于2019年9月26日周四 下午7:34写道: > > add log file. > > alex zheng <tc0721@xxxxxxxxx> 于2019年9月26日周四 下午6:38写道: > > > > Hi, > > > > Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> 于2019年9月26日周四 下午4:19写道: > > > > > > On 26.9.2019 8.45, Felipe Balbi wrote: > > > > > > > > Hi, > > > > > > > > David Laight <David.Laight@xxxxxxxxxx> writes: > > > >> From: Mathias Nyman > > > >>> Sent: 25 September 2019 15:48 > > > >>> > > > >>> On 24.9.2019 17.45, alex zheng wrote: > > > >>>> Hi Mathias, > > > >> ... > > > >>> Logs show your transfer ring has four segments, but hardware fails to > > > >>> jump from the last segment back to first) > > > >>> > > > >>> Last TRB (LINK TRB) of each segment points to the next segment, > > > >>> last segments link trb points back to first segment. > > > >>> > > > >>> In your case: > > > >>> 0x1d117000 -> 0x1eb09000 -> 0x1eb0a000 -> 0x1dbda000 -> (back to 0x1d117000) > > > >>> > > > >>> For some reason your hardware doesn't treat the last TRB at the last segment > > > >>> as a LINK TRB, instead it just issues a transfer event for it, and continues to > > > >>> the next address instead of jumping back to first segment: > > > >> > > > >> That could be a cache coherency (or flushing (etc)) issue. > > > > > > The Link TRB is written very early, right after the ring segment is allocated, > > > and before any other TRBs. 255 other TRBs were written and handled by hw > > > on this segment after this, so not very likely a flushing/cache coherency issue. > > > > > I add a flush_cache_all() after queue_trb everytime but it make no > > use. It seems > > not a flushing/cache coherency issus. > > > > flush like this: > > inc_enq(xhci, ring, more_trbs_coming); > > + flush_cache_all(); > > > > > > > > > > XHCI has a HW-configurable maximum number of segments in a ring. AFAICT, > > > > xhci driver doesn't take that into consideration today. Perhaps the HW > > > > in question doesn't like more than 3 segments. > > > > > > > > Mathias, what was the register to check this? Do you remember? > > > > > > > > > > I only recall a limit for the event ring in the HSCPARAMS2 register(ERST MAX), > > > not for transfer rings. > > > > > > Other things to look at would be > > > > > > - check that Toggle Cycle bit is correct for last segments link TRB (incomplete logs) > > > > I dump an other error log, more complete logs see attached > > file(transfer_error_0926.cap), in the log: > > the error link TRB: > > 0x1d00dff0: TRB 000000001d068000 status 'Invalid' len 0 slot 0 ep 0 > > type 'Link' flags e:c > > and last segment link TRB: > > 0x1eb0aff0: TRB 000000001d00d000 status 'Invalid' len 0 slot 0 ep 0 > > type 'Link' flags e:C > > > > > - some old xHCI hardware needed the Chain bit set in link TRB for some isoc rings > > xhci ver is 1.1: > > 6.888570] c1 46 (kworker/u8:1) xhci-hcd xhci-hcd.0.auto: HCIVERSION: 0x110 > > > > > - was ring recently expanded?, usually rings start with only two segments > > The extra segments are expanded after raw data test run a while, > > especially when the RNDIS test(iperf3) begin to run. > > > > Other info: > > 1. This issue seems only happened when the raw bulk data test and the > > rndis test(other pair endpoints) run at the same time, and happens > > more often if we queue trb more quick. > > 2. The raw bulk data test case is a libusb test use ep4(in) & ep3(out) > > to transfer raw bulk data, and I use iperf3(tcp) to test USB rndis. > > 3. The log file attached only show ep4(in) enqueue/dequeue log for > > more readable, > > 4. More test result show as below: > > 1) run just one raw bulk data test --> (always fine) > > 2) run raw rulk data test + rndis test run at the same > > time --> (transfer error in 10 minutes) > > 3) run two raw bulk data test run at the same time (with > > two pair endpoint) --> (transfer error in 10 minutes) > > 5. I try to modify the DWC3 hw registers like TX/RX FIFO size, > > GTXTHRCFG/GRXTHRCFG , but also did not work. > > 6. Related interface info: > > 8801 I:* If#= 0 Alt= 0 #EPs= 1 Cls=e0(wlcon) Sub=01 > > Prot=03 Driver=rndis_host > > 8802 E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms > > 8803 I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 > > Prot=00 Driver=rndis_host -----> used in rndis test > > 8804 E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms > > 8805 E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms > > 8809 I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=43 > > Prot=01 Driver=(none) -----> used in raw bulk test > > 8810 E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms > > 8811 E: Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms > > 8820 I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=43 > > Prot=01 Driver=(none) ----> used in double raw bulk test > > 8821 E: Ad=06(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms > > 8822 E: Ad=88(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms > > > > It seems that there are some conflicts when multiple endpoints work at > > the same time on our SOC. Are there any other way can try? > > > > > > > > > > Mathias