Hi Mathias, Thank you kindly for your feedback - I appreciate the time you took to help me out. Unfortunately, in a rather bizarre turn of events, I'm unable to reproduce the issue. I'm stumped as to why it's not occurring any more. I've tried a number of things including the distributed compile, nfs, copying data with `dd`, and I can see up to a gigabit going over the network, but still I can't get it to crash! I've tried with both the trace enabled and disabled. Previously it was consistently reproducible, occurring under a minute after I began the compile - now it ran for a good 30 mins without interruption. Something must have obviously changed, but I can't tell what it is. Admittedly I did only try this with 4.14.22 (not 4.9), but this did occur with the later version a few days ago (I checked syslogs). Now that I know what it is that you need to debug this, I will respond to this thread if I'm ever able to reproduce this again. Thanks again, Luciano On Fri, Mar 2, 2018 at 12:41 PM, Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> wrote: > On 01.03.2018 22:56, ljoublanc@xxxxxxxxx wrote: >> >> I'm a user of gentoo linux, and would like to raise a bug report. I hope >> these >> are the correct channels; if not, I apologise in advance. >> >> I'm using a realtek-based USB3 to RJ45 gigabit adapter. This plugs >> directly into my laptop, which is a Toshiba Radius P20W-C-103, skylake >> based, with the following controller: >> >> ``` >> 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 >> xHCI Controller (rev 21) >> ``` >> >> I am experiencing this on 4.9.79-r1, and also 4.14.22. >> >> When I plug the device in, unless I disable power management on USB >> hubs 3 and 4, I get errors saying 'root hub lost power or was reset'. >> However, if I disable PM using powertop, I get the device to work >> seemingly well. But, as soon as I start heavy transfers (in my case >> distributed compile), the network device stops responding >> >> The error messages that I'm receiving are very similar to a bug described >> here: https://bugs.launchpad.net/dell-sputnik/+bug/1729674 >> But I've been told that it's unrelated. >> >> First, these are the logs showing the dongle being plugged in. >> ``` >> Feb 26 20:17:09 nizuc kernel: usb usb3: root hub lost power or was reset >> Feb 26 20:17:09 nizuc kernel: usb usb4: root hub lost power or was reset >> Feb 26 20:17:41 nizuc kernel: usb 4-1: new SuperSpeed USB device >> number 2 using xhci_hcd >> Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device found, >> idVendor=0bda, idProduct=8153 >> Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device strings: Mfr=1, >> Product=2, SerialNumber=6 >> Feb 26 20:17:41 nizuc kernel: usb 4-1: Product: USB 10/100/1000 LAN >> Feb 26 20:17:41 nizuc kernel: usb 4-1: Manufacturer: Realtek >> Feb 26 20:17:41 nizuc kernel: usb 4-1: SerialNumber: 000001 >> Feb 26 20:17:41 nizuc kernel: usb 4-1: reset SuperSpeed USB device >> number 2 using xhci_hcd >> Feb 26 20:17:41 nizuc NetworkManager[2049]: <info> [1519676261.9009] >> manager: (eth0): new Ethernet device >> (/org/freedesktop/NetworkManager/Devices/5) >> Feb 26 20:17:41 nizuc kernel: r8152 4-1:1.0 eth0: v1.09.9 >> Feb 26 20:17:42 nizuc mtp-probe[3673]: checking bus 4, device 2: >> "/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1" >> Feb 26 20:17:42 nizuc mtp-probe[3673]: bus: 4, device: 2 was not an MTP >> device >> Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on >> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1 >> Feb 26 20:17:42 nizuc systemd-udevd[3676]: link_config: >> autonegotiation is unset or enabled, the speed and duplex are not >> writable. >> Feb 26 20:17:42 nizuc kernel: r8152 4-1:1.0 enp1s0u1: renamed from eth0 >> Feb 26 20:17:42 nizuc NetworkManager[2049]: <info> [1519676262.2645] >> device (eth0): interface index 4 renamed iface from 'eth0' to >> 'enp1s0u1' >> Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1: >> link is not ready >> Feb 26 20:17:42 nizuc NetworkManager[2049]: <info> [1519676262.2829] >> device (enp1s0u1): state change: unmanaged -> unavailable (reason >> 'managed', internal state 'external') >> Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on >> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1/4-1:1.0 >> Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1: >> link is not ready >> Feb 26 20:17:46 nizuc kernel: r8152 4-1:1.0 enp1s0u1: carrier on >> Feb 26 20:17:46 nizuc kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1: >> link becomes ready >> ``` >> >> but once I begin distrubted compile, the network drops, with the following >> messages: >> >> ``` >> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV0000000f >> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_string) got >> '-Wa,-mtune=i686' >> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_argv) argv[22] = >> "-Wa,-mtune=i686" >> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV00000011 >> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer >> event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for >> event-dma 000000020c8aea40 trb-start 000000020c8aea20 trb-end >> 000000020c8aea20 seg-start 000000020c8ae000 s> > > > From xhci usb3 host controller point of view there are a couple usb transfer > blocks (TRB) sitting on the transfer ring that were not handled (at ..ea20 > and --ea30). > but we get transfer events for the following transfer blocks starting (at > ..ea40) > > Driver can't handle TRBs finishing out of order. > >> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer >> event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for >> event-dma 000000020c8aea50 trb-start 000000020c8aea20 trb-end > > > This was for the next tranfer block (at ..ea50) so hw xHC hardware > seems to proceed normally anfter two TRBs are skipped > >> >> I hope somebody here can direct me as to how to best proceed. Should I try >> modifying the aforementioned patch and see if it helps (note the patch >> only >> activates for certain usb vendor/device IDs). Also, the ubuntu link >> suggests >> that this may be the result of "offloading checksuming" which can be >> disabled >> with ethtool. Should I try disabling this and see if it makes a >> difference? >> Many thanks for your patience. Please let me know if there is any more >> information I can provide. > > > To really know what's happening to those two TRBs I would need xhci traces. > Trace file will be huge, use 4.14 or later kernel > > mount -t debugfs none /sys/kernel/debug > echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb > echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable > > do your distributed compile until you see the > "ERROR Transfer event TRB DMA ptr not part of current TD" > > and then send me the content of /sys/kernel/debug/tracing/trace > > Thanks > -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html