Re: bug report : TBR oveflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.03.2018 22:56, ljoublanc@xxxxxxxxx wrote:
I'm a user of gentoo linux, and would like to raise a bug report. I hope these
are the correct channels; if not, I apologise in advance.

I'm using a realtek-based USB3 to RJ45 gigabit adapter. This plugs
directly into my laptop, which is a Toshiba Radius P20W-C-103, skylake
based, with the following controller:

```
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0
xHCI Controller (rev 21)
```

I am experiencing this on 4.9.79-r1, and also 4.14.22.

When I plug the device in, unless I disable power management on USB
hubs 3 and 4, I get errors saying 'root hub lost power or was reset'.
However, if I disable PM using powertop, I get the device to work
seemingly well. But, as soon as I start heavy transfers (in my case
distributed compile), the network device stops responding

The error messages that I'm receiving are very similar to a bug described
here: https://bugs.launchpad.net/dell-sputnik/+bug/1729674
But I've been told that it's unrelated.

First, these are the logs showing the dongle being plugged in.
```
Feb 26 20:17:09 nizuc kernel: usb usb3: root hub lost power or was reset
Feb 26 20:17:09 nizuc kernel: usb usb4: root hub lost power or was reset
Feb 26 20:17:41 nizuc kernel: usb 4-1: new SuperSpeed USB device
number 2 using xhci_hcd
Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device found,
idVendor=0bda, idProduct=8153
Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device strings: Mfr=1,
Product=2, SerialNumber=6
Feb 26 20:17:41 nizuc kernel: usb 4-1: Product: USB 10/100/1000 LAN
Feb 26 20:17:41 nizuc kernel: usb 4-1: Manufacturer: Realtek
Feb 26 20:17:41 nizuc kernel: usb 4-1: SerialNumber: 000001
Feb 26 20:17:41 nizuc kernel: usb 4-1: reset SuperSpeed USB device
number 2 using xhci_hcd
Feb 26 20:17:41 nizuc NetworkManager[2049]: <info>  [1519676261.9009]
manager: (eth0): new Ethernet device
(/org/freedesktop/NetworkManager/Devices/5)
Feb 26 20:17:41 nizuc kernel: r8152 4-1:1.0 eth0: v1.09.9
Feb 26 20:17:42 nizuc mtp-probe[3673]: checking bus 4, device 2:
"/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1"
Feb 26 20:17:42 nizuc mtp-probe[3673]: bus: 4, device: 2 was not an MTP device
Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on
/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1
Feb 26 20:17:42 nizuc systemd-udevd[3676]: link_config:
autonegotiation is unset or enabled, the speed and duplex are not
writable.
Feb 26 20:17:42 nizuc kernel: r8152 4-1:1.0 enp1s0u1: renamed from eth0
Feb 26 20:17:42 nizuc NetworkManager[2049]: <info>  [1519676262.2645]
device (eth0): interface index 4 renamed iface from 'eth0' to
'enp1s0u1'
Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1:
link is not ready
Feb 26 20:17:42 nizuc NetworkManager[2049]: <info>  [1519676262.2829]
device (enp1s0u1): state change: unmanaged -> unavailable (reason
'managed', internal state 'external')
Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on
/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1/4-1:1.0
Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1:
link is not ready
Feb 26 20:17:46 nizuc kernel: r8152 4-1:1.0 enp1s0u1: carrier on
Feb 26 20:17:46 nizuc kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1:
link becomes ready
```

but once I begin distrubted compile, the network drops, with the following
messages:

```
Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV0000000f
Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_string) got '-Wa,-mtune=i686'
Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_argv) argv[22] = "-Wa,-mtune=i686"
Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV00000011
Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer
event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for
event-dma 000000020c8aea40 trb-start 000000020c8aea20 trb-end
000000020c8aea20 seg-start 000000020c8ae000 s>

From xhci usb3 host controller point of view there are a couple usb transfer
blocks (TRB) sitting on the transfer ring that were not handled (at ..ea20 and --ea30).
but we get transfer events for the following transfer blocks starting (at ..ea40)

Driver can't handle TRBs finishing out of order.

Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer
event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for
event-dma 000000020c8aea50 trb-start 000000020c8aea20 trb-end

This was for the next tranfer block (at ..ea50) so hw xHC hardware
seems to proceed normally anfter two TRBs are skipped


I hope somebody here can direct me as to how to best proceed. Should I try
modifying the aforementioned patch and see if it helps (note the patch only
activates for certain usb vendor/device IDs). Also, the ubuntu link suggests
that this may be the result of "offloading checksuming" which can be disabled
with ethtool. Should I try disabling this and see if it makes a difference?
Many thanks for your patience. Please let me know if there is any more
information I can provide.

To really know what's happening to those two TRBs I would need xhci traces.
Trace file will be huge, use 4.14 or later kernel

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

 do your distributed compile until you see the
"ERROR Transfer event TRB DMA ptr not part of current TD"

and then send me the content of /sys/kernel/debug/tracing/trace

Thanks
-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux