On 01.03.2018 22:56, ljoublanc@xxxxxxxxx wrote:
I'm a user of gentoo linux, and would like to raise a bug report. I hope these are the correct channels; if not, I apologise in advance. I'm using a realtek-based USB3 to RJ45 gigabit adapter. This plugs directly into my laptop, which is a Toshiba Radius P20W-C-103, skylake based, with the following controller: ``` 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) ``` I am experiencing this on 4.9.79-r1, and also 4.14.22. When I plug the device in, unless I disable power management on USB hubs 3 and 4, I get errors saying 'root hub lost power or was reset'. However, if I disable PM using powertop, I get the device to work seemingly well. But, as soon as I start heavy transfers (in my case distributed compile), the network device stops responding The error messages that I'm receiving are very similar to a bug described here: https://bugs.launchpad.net/dell-sputnik/+bug/1729674 But I've been told that it's unrelated. First, these are the logs showing the dongle being plugged in. ``` Feb 26 20:17:09 nizuc kernel: usb usb3: root hub lost power or was reset Feb 26 20:17:09 nizuc kernel: usb usb4: root hub lost power or was reset Feb 26 20:17:41 nizuc kernel: usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device found, idVendor=0bda, idProduct=8153 Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6 Feb 26 20:17:41 nizuc kernel: usb 4-1: Product: USB 10/100/1000 LAN Feb 26 20:17:41 nizuc kernel: usb 4-1: Manufacturer: Realtek Feb 26 20:17:41 nizuc kernel: usb 4-1: SerialNumber: 000001 Feb 26 20:17:41 nizuc kernel: usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd Feb 26 20:17:41 nizuc NetworkManager[2049]: <info> [1519676261.9009] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/5) Feb 26 20:17:41 nizuc kernel: r8152 4-1:1.0 eth0: v1.09.9 Feb 26 20:17:42 nizuc mtp-probe[3673]: checking bus 4, device 2: "/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1" Feb 26 20:17:42 nizuc mtp-probe[3673]: bus: 4, device: 2 was not an MTP device Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1 Feb 26 20:17:42 nizuc systemd-udevd[3676]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Feb 26 20:17:42 nizuc kernel: r8152 4-1:1.0 enp1s0u1: renamed from eth0 Feb 26 20:17:42 nizuc NetworkManager[2049]: <info> [1519676262.2645] device (eth0): interface index 4 renamed iface from 'eth0' to 'enp1s0u1' Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1: link is not ready Feb 26 20:17:42 nizuc NetworkManager[2049]: <info> [1519676262.2829] device (enp1s0u1): state change: unmanaged -> unavailable (reason 'managed', internal state 'external') Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1/4-1:1.0 Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1: link is not ready Feb 26 20:17:46 nizuc kernel: r8152 4-1:1.0 enp1s0u1: carrier on Feb 26 20:17:46 nizuc kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1: link becomes ready ``` but once I begin distrubted compile, the network drops, with the following messages: ``` Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV0000000f Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_string) got '-Wa,-mtune=i686' Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_argv) argv[22] = "-Wa,-mtune=i686" Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV00000011 Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for event-dma 000000020c8aea40 trb-start 000000020c8aea20 trb-end 000000020c8aea20 seg-start 000000020c8ae000 s>
From xhci usb3 host controller point of view there are a couple usb transfer blocks (TRB) sitting on the transfer ring that were not handled (at ..ea20 and --ea30). but we get transfer events for the following transfer blocks starting (at ..ea40) Driver can't handle TRBs finishing out of order.
Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for event-dma 000000020c8aea50 trb-start 000000020c8aea20 trb-end
This was for the next tranfer block (at ..ea50) so hw xHC hardware seems to proceed normally anfter two TRBs are skipped
I hope somebody here can direct me as to how to best proceed. Should I try modifying the aforementioned patch and see if it helps (note the patch only activates for certain usb vendor/device IDs). Also, the ubuntu link suggests that this may be the result of "offloading checksuming" which can be disabled with ethtool. Should I try disabling this and see if it makes a difference? Many thanks for your patience. Please let me know if there is any more information I can provide.
To really know what's happening to those two TRBs I would need xhci traces. Trace file will be huge, use 4.14 or later kernel mount -t debugfs none /sys/kernel/debug echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable do your distributed compile until you see the "ERROR Transfer event TRB DMA ptr not part of current TD" and then send me the content of /sys/kernel/debug/tracing/trace Thanks -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html