Re: bug report : TBR oveflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mathias,

Thank you kindly for your feedback - I appreciate the time you took to
help me out.

Unfortunately, in a rather bizarre turn of events, I'm unable to
reproduce the issue. I'm stumped as to why it's not occurring any
more. I've tried a number of things including the distributed compile,
nfs, copying data with `dd`, and I can see up to a gigabit going over
the network, but still I can't get it to crash! I've tried with both
the trace enabled and disabled. Previously it was consistently
reproducible, occurring under a minute after I began the compile - now
it ran for a good 30 mins without interruption. Something must have
obviously changed, but I can't tell what it is.

Admittedly I did only try this with 4.14.22 (not 4.9), but this did
occur with the later version a few days ago (I checked syslogs).

Now that I know what it is that you need to debug this, I will respond
to this thread if I'm ever able to reproduce this again.

Thanks again,

Luciano

On Fri, Mar 2, 2018 at 12:41 PM, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> On 01.03.2018 22:56, ljoublanc@xxxxxxxxx wrote:
>>
>> I'm a user of gentoo linux, and would like to raise a bug report. I hope
>> these
>> are the correct channels; if not, I apologise in advance.
>>
>> I'm using a realtek-based USB3 to RJ45 gigabit adapter. This plugs
>> directly into my laptop, which is a Toshiba Radius P20W-C-103, skylake
>> based, with the following controller:
>>
>> ```
>> 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0
>> xHCI Controller (rev 21)
>> ```
>>
>> I am experiencing this on 4.9.79-r1, and also 4.14.22.
>>
>> When I plug the device in, unless I disable power management on USB
>> hubs 3 and 4, I get errors saying 'root hub lost power or was reset'.
>> However, if I disable PM using powertop, I get the device to work
>> seemingly well. But, as soon as I start heavy transfers (in my case
>> distributed compile), the network device stops responding
>>
>> The error messages that I'm receiving are very similar to a bug described
>> here: https://bugs.launchpad.net/dell-sputnik/+bug/1729674
>> But I've been told that it's unrelated.
>>
>> First, these are the logs showing the dongle being plugged in.
>> ```
>> Feb 26 20:17:09 nizuc kernel: usb usb3: root hub lost power or was reset
>> Feb 26 20:17:09 nizuc kernel: usb usb4: root hub lost power or was reset
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: new SuperSpeed USB device
>> number 2 using xhci_hcd
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device found,
>> idVendor=0bda, idProduct=8153
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device strings: Mfr=1,
>> Product=2, SerialNumber=6
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: Product: USB 10/100/1000 LAN
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: Manufacturer: Realtek
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: SerialNumber: 000001
>> Feb 26 20:17:41 nizuc kernel: usb 4-1: reset SuperSpeed USB device
>> number 2 using xhci_hcd
>> Feb 26 20:17:41 nizuc NetworkManager[2049]: <info>  [1519676261.9009]
>> manager: (eth0): new Ethernet device
>> (/org/freedesktop/NetworkManager/Devices/5)
>> Feb 26 20:17:41 nizuc kernel: r8152 4-1:1.0 eth0: v1.09.9
>> Feb 26 20:17:42 nizuc mtp-probe[3673]: checking bus 4, device 2:
>> "/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1"
>> Feb 26 20:17:42 nizuc mtp-probe[3673]: bus: 4, device: 2 was not an MTP
>> device
>> Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on
>> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1
>> Feb 26 20:17:42 nizuc systemd-udevd[3676]: link_config:
>> autonegotiation is unset or enabled, the speed and duplex are not
>> writable.
>> Feb 26 20:17:42 nizuc kernel: r8152 4-1:1.0 enp1s0u1: renamed from eth0
>> Feb 26 20:17:42 nizuc NetworkManager[2049]: <info>  [1519676262.2645]
>> device (eth0): interface index 4 renamed iface from 'eth0' to
>> 'enp1s0u1'
>> Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1:
>> link is not ready
>> Feb 26 20:17:42 nizuc NetworkManager[2049]: <info>  [1519676262.2829]
>> device (enp1s0u1): state change: unmanaged -> unavailable (reason
>> 'managed', internal state 'external')
>> Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on
>> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1/4-1:1.0
>> Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1:
>> link is not ready
>> Feb 26 20:17:46 nizuc kernel: r8152 4-1:1.0 enp1s0u1: carrier on
>> Feb 26 20:17:46 nizuc kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1:
>> link becomes ready
>> ```
>>
>> but once I begin distrubted compile, the network drops, with the following
>> messages:
>>
>> ```
>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV0000000f
>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_string) got
>> '-Wa,-mtune=i686'
>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_argv) argv[22] =
>> "-Wa,-mtune=i686"
>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV00000011
>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer
>> event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for
>> event-dma 000000020c8aea40 trb-start 000000020c8aea20 trb-end
>> 000000020c8aea20 seg-start 000000020c8ae000 s>
>
>
> From xhci usb3 host controller point of view there are a couple usb transfer
> blocks (TRB) sitting on the transfer ring that were not handled (at ..ea20
> and --ea30).
> but we get transfer events for the following transfer blocks starting (at
> ..ea40)
>
> Driver can't handle TRBs finishing out of order.
>
>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer
>> event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for
>> event-dma 000000020c8aea50 trb-start 000000020c8aea20 trb-end
>
>
> This was for the next tranfer block (at ..ea50) so hw xHC hardware
> seems to proceed normally anfter two TRBs are skipped
>
>>
>> I hope somebody here can direct me as to how to best proceed. Should I try
>> modifying the aforementioned patch and see if it helps (note the patch
>> only
>> activates for certain usb vendor/device IDs). Also, the ubuntu link
>> suggests
>> that this may be the result of "offloading checksuming" which can be
>> disabled
>> with ethtool. Should I try disabling this and see if it makes a
>> difference?
>> Many thanks for your patience. Please let me know if there is any more
>> information I can provide.
>
>
> To really know what's happening to those two TRBs I would need xhci traces.
> Trace file will be huge, use 4.14 or later kernel
>
> mount -t debugfs none /sys/kernel/debug
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>
>  do your distributed compile until you see the
> "ERROR Transfer event TRB DMA ptr not part of current TD"
>
> and then send me the content of /sys/kernel/debug/tracing/trace
>
> Thanks
> -Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux