Re: bug report : TBR oveflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Again Mathias,

I did a system update yesterday (using gentoo, so I build everything
with distcc), and managed to get it to crash.

[48292.615897] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr
not part of current TD ep_index 2 comp_code 13
[48292.615902] xhci_hcd 0000:01:00.0: Looking for event-dma
00000001d91f5050 trb-start 00000001d91f5030 trb-end 00000001d91f5030
seg-start 00000001d91f5000 seg-end 00000001d91f5ff0

I'm suspecting that this happens when I use a combination of USB
devices; this happened when I was browsing the web while compiling.
Also it happened _after_ I'd finished compiling all packages, while I
was running something over NFS - I don't believe the network
throughput was very high at that point.

The trace file compresses to about 30 Meg (800 compressed), so I will
send you a link directly.

Luciano


On Sun, Mar 4, 2018 at 3:07 PM,  <ljoublanc@xxxxxxxxx> wrote:
> Hi Mathias,
>
> Thank you kindly for your feedback - I appreciate the time you took to
> help me out.
>
> Unfortunately, in a rather bizarre turn of events, I'm unable to
> reproduce the issue. I'm stumped as to why it's not occurring any
> more. I've tried a number of things including the distributed compile,
> nfs, copying data with `dd`, and I can see up to a gigabit going over
> the network, but still I can't get it to crash! I've tried with both
> the trace enabled and disabled. Previously it was consistently
> reproducible, occurring under a minute after I began the compile - now
> it ran for a good 30 mins without interruption. Something must have
> obviously changed, but I can't tell what it is.
>
> Admittedly I did only try this with 4.14.22 (not 4.9), but this did
> occur with the later version a few days ago (I checked syslogs).
>
> Now that I know what it is that you need to debug this, I will respond
> to this thread if I'm ever able to reproduce this again.
>
> Thanks again,
>
> Luciano
>
> On Fri, Mar 2, 2018 at 12:41 PM, Mathias Nyman
> <mathias.nyman@xxxxxxxxxxxxxxx> wrote:
>> On 01.03.2018 22:56, ljoublanc@xxxxxxxxx wrote:
>>>
>>> I'm a user of gentoo linux, and would like to raise a bug report. I hope
>>> these
>>> are the correct channels; if not, I apologise in advance.
>>>
>>> I'm using a realtek-based USB3 to RJ45 gigabit adapter. This plugs
>>> directly into my laptop, which is a Toshiba Radius P20W-C-103, skylake
>>> based, with the following controller:
>>>
>>> ```
>>> 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0
>>> xHCI Controller (rev 21)
>>> ```
>>>
>>> I am experiencing this on 4.9.79-r1, and also 4.14.22.
>>>
>>> When I plug the device in, unless I disable power management on USB
>>> hubs 3 and 4, I get errors saying 'root hub lost power or was reset'.
>>> However, if I disable PM using powertop, I get the device to work
>>> seemingly well. But, as soon as I start heavy transfers (in my case
>>> distributed compile), the network device stops responding
>>>
>>> The error messages that I'm receiving are very similar to a bug described
>>> here: https://bugs.launchpad.net/dell-sputnik/+bug/1729674
>>> But I've been told that it's unrelated.
>>>
>>> First, these are the logs showing the dongle being plugged in.
>>> ```
>>> Feb 26 20:17:09 nizuc kernel: usb usb3: root hub lost power or was reset
>>> Feb 26 20:17:09 nizuc kernel: usb usb4: root hub lost power or was reset
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: new SuperSpeed USB device
>>> number 2 using xhci_hcd
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device found,
>>> idVendor=0bda, idProduct=8153
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: New USB device strings: Mfr=1,
>>> Product=2, SerialNumber=6
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: Product: USB 10/100/1000 LAN
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: Manufacturer: Realtek
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: SerialNumber: 000001
>>> Feb 26 20:17:41 nizuc kernel: usb 4-1: reset SuperSpeed USB device
>>> number 2 using xhci_hcd
>>> Feb 26 20:17:41 nizuc NetworkManager[2049]: <info>  [1519676261.9009]
>>> manager: (eth0): new Ethernet device
>>> (/org/freedesktop/NetworkManager/Devices/5)
>>> Feb 26 20:17:41 nizuc kernel: r8152 4-1:1.0 eth0: v1.09.9
>>> Feb 26 20:17:42 nizuc mtp-probe[3673]: checking bus 4, device 2:
>>> "/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1"
>>> Feb 26 20:17:42 nizuc mtp-probe[3673]: bus: 4, device: 2 was not an MTP
>>> device
>>> Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on
>>> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1
>>> Feb 26 20:17:42 nizuc systemd-udevd[3676]: link_config:
>>> autonegotiation is unset or enabled, the speed and duplex are not
>>> writable.
>>> Feb 26 20:17:42 nizuc kernel: r8152 4-1:1.0 enp1s0u1: renamed from eth0
>>> Feb 26 20:17:42 nizuc NetworkManager[2049]: <info>  [1519676262.2645]
>>> device (eth0): interface index 4 renamed iface from 'eth0' to
>>> 'enp1s0u1'
>>> Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1:
>>> link is not ready
>>> Feb 26 20:17:42 nizuc NetworkManager[2049]: <info>  [1519676262.2829]
>>> device (enp1s0u1): state change: unmanaged -> unavailable (reason
>>> 'managed', internal state 'external')
>>> Feb 26 20:17:42 nizuc upowerd[2168]: unhandled action 'bind' on
>>> /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1/4-1:1.0
>>> Feb 26 20:17:42 nizuc kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0u1:
>>> link is not ready
>>> Feb 26 20:17:46 nizuc kernel: r8152 4-1:1.0 enp1s0u1: carrier on
>>> Feb 26 20:17:46 nizuc kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1:
>>> link becomes ready
>>> ```
>>>
>>> but once I begin distrubted compile, the network drops, with the following
>>> messages:
>>>
>>> ```
>>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV0000000f
>>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_string) got
>>> '-Wa,-mtune=i686'
>>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_argv) argv[22] =
>>> "-Wa,-mtune=i686"
>>> Feb 26 20:25:32 nizuc distccd[13928]: (dcc_r_token_int) got ARGV00000011
>>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer
>>> event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
>>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for
>>> event-dma 000000020c8aea40 trb-start 000000020c8aea20 trb-end
>>> 000000020c8aea20 seg-start 000000020c8ae000 s>
>>
>>
>> From xhci usb3 host controller point of view there are a couple usb transfer
>> blocks (TRB) sitting on the transfer ring that were not handled (at ..ea20
>> and --ea30).
>> but we get transfer events for the following transfer blocks starting (at
>> ..ea40)
>>
>> Driver can't handle TRBs finishing out of order.
>>
>>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: ERROR Transfer
>>> event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
>>> Feb 26 20:25:32 nizuc kernel: xhci_hcd 0000:01:00.0: Looking for
>>> event-dma 000000020c8aea50 trb-start 000000020c8aea20 trb-end
>>
>>
>> This was for the next tranfer block (at ..ea50) so hw xHC hardware
>> seems to proceed normally anfter two TRBs are skipped
>>
>>>
>>> I hope somebody here can direct me as to how to best proceed. Should I try
>>> modifying the aforementioned patch and see if it helps (note the patch
>>> only
>>> activates for certain usb vendor/device IDs). Also, the ubuntu link
>>> suggests
>>> that this may be the result of "offloading checksuming" which can be
>>> disabled
>>> with ethtool. Should I try disabling this and see if it makes a
>>> difference?
>>> Many thanks for your patience. Please let me know if there is any more
>>> information I can provide.
>>
>>
>> To really know what's happening to those two TRBs I would need xhci traces.
>> Trace file will be huge, use 4.14 or later kernel
>>
>> mount -t debugfs none /sys/kernel/debug
>> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
>> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>>
>>  do your distributed compile until you see the
>> "ERROR Transfer event TRB DMA ptr not part of current TD"
>>
>> and then send me the content of /sys/kernel/debug/tracing/trace
>>
>> Thanks
>> -Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux