Re: EPROTO when USB 3 GbE adapters are under load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25.10.2018 12:52, Hao Wei Tee wrote:
On 25/10/18 4:45 PM, Mathias Nyman wrote:
Reproducing the issue with a recent kernel with xhci traces enabled should show the reason for EPROTO error.

Add xhci traces before triggering the issue with:

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

after issue is triggered save and send the trace at /sys/kernel/debug/tracing/trace
Note that it might be huge

Thanks for the suggestion.

Here[1] is (part of) the trace starting about 250 lines before the EPROTO happens.

[1]: https://gist.githubusercontent.com/angelsl/fdd04d2bded3a41029122b0536c00944/raw/b8e9f7d2695ac030b7f3dd53a1a9c3f37da7b7a0/trace

The first error happens at line 243 (timestamp 8144.248398) coinciding with the start of errors spewed into dmesg:

[ 8144.245359] r8152 2-2:1.0 enp0s20f0u2: Rx status -71
[ 8144.248837] r8152 2-2:1.0 enp0s20f0u2: Rx status -71
[ 8144.252392] r8152 2-2:1.0 enp0s20f0u2: Rx status -71
[ 8144.255987] r8152 2-2:1.0 enp0s20f0u2: Stop submitting intr, status -71

Thanks,
xHC controller reports that there was a transaction error on one of the bulk TRBs.

The transaction error causes the endpoint to halt (host side halt only).
Xhci driver resets the host side endpoint to recover from the halt,
then returns the broken URB (TRB) with -EPROTO status, and then moves past this TRB.

Interesting thing here is that each TRB in the queue after the transaction error
also triggers a transaction error.

This might be a data toggle/sequence number sync issue.
The host side endpoint reset clears the host side sequence number,
and host expects device side endpoint to be reset and sequence to be cleared as well
as a result of returning -EPROTO.
If I remember correctly xhci driver does not wait for device side endpoint to be reset,
so if there are  TRBs in the queue they will be transferred, with a cleared sequence number
out of sync with the device side.

There is a patch in usb-next that might help.
f8f80be xhci: Use soft retry to recover faster from transaction errors

It soft resets the halted host side endpoint, clears the halt without clearing the sequence number.

-Mathias



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux