On 5.9.2024 20.24, Marc SCHAEFER wrote:
On Thu, Sep 05, 2024 at 05:06:15PM +0200, Marc SCHAEFER wrote:
[ 3040.780444] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
[ 3040.828021] scsi host16: uas_eh_device_reset_handler success
[ 4808.165397] xhci_hcd 0000:01:00.0: Cancel URB 00000000bcd65caa, dev 1.1.1, ep 0x85, stream_id 6 starting at offset 0xd9964960
[ 4808.439706] xhci_hcd 0000:01:00.0: Cancel URB 000000000d48b0aa, dev 1.1.4, ep 0x85, stream_id 6 starting at offset 0x12c7ef0
[ 4808.439981] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[ 4808.439984] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 00000000012c7f02 stream 6 slot 12 ep 11 flags c
[ 4808.450328] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[ 4808.450339] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c7f00 trb-end 0000000000000000 seg-start 00000000012c7000 seg-end 00000000012c7ff0
[ 4808.450342] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c6000 trb-end 00000000012c6700 seg-start 00000000012c6000 seg-end 00000000012c6ff0
[ 4808.450345] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
Thanks.
So the failed 'Set TR Deq' command is the start of the issue.
xHC host fails to move the past that cancelled transfer, but xhci driver gives
back the URB anyway. Once ring continues running it starts from that cancelled
transfer, triggering a babble error, pointing to that given back TD.
Still unclear why that 'Set TR Deq' command failed in the first place.
The 'Set TR Deq' command TRB looks fine. Stream ID is ok, (6) and Stream Context
type (SCT) which is bits 3:1 of dequeue 00000000012c7f02 is looks fine (=1, which
means primary stream array with dequeue pointing to transfer ring)
Maybe stream is dropped, or something else is messing with endpoint context
MaxPSstream, or Linear Stream Address (LSA) fields before the 'Set TD Deq' command.
That would trigger stream ID issues like this.
Can I ask you to add some select traces on top of the patch?
They should show content of endpoint context:
as root:
mount -t debugfs none /sys/kernel/debug/tracing
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_command/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_cmd_set_deq/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_cmd_set_deq_ep/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_cmd_config_ep/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_configure_endpoint/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_configure_endpoint_ctrl_ctx/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
<reproduce issue>
Send content of /sys/kernel/debug/tracing/trace
Sample output:
<idle>-0 [013] d.h2. 1606.344913: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000116f65921 stream 0 slot 4 ep 1 flags C
<idle>-0 [013] d.h2. 1606.344915: xhci_handle_cmd_set_deq: RS 00000 full-speed Ctx Entries 9 MEL 0 us Port# 6/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 4 State configured
<idle>-0 [013] d.h2. 1606.344917: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Ctrl burst 0 maxp 64 deq 0000000116f65921 avg trb len 0
<idle>-0 [013] d.h2. 1606.376754: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 1 flags t:C
And then send me both the trace and dmesg of triggered case.
Thanks
Mathias