Hi
Looks like there are two cases that might be related.
One is that the device seems to send more data than host asks for.
This triggers a babble error pointing to a already returned transfer, so error
does not get properly handled.
Other issue is the "WARN Set TR Deq Ptr cmd invalid because of stream ID configuration" error.
xhci driver queues a set TR Deq Ptr command when canceling transfers.
This error shown if there is an TRB_ERROR in the actual command we queue.
I can start working on some debugging patches as well if you have the time to try
them out.
More details inlined in log below:
On 3.9.2024 22.40, Marc SCHAEFER wrote:
Re,
On Tue, Sep 03, 2024 at 05:45:35PM +0200, Micha?? Pecio wrote:
Hmm, this is possibly not a coincidence, but it's also not the same
"ERROR Transfer event TRB DMA ptr not part of current TD" that happened
Got one:
Sep 3 21:32:58 video kernel: [11408.230466] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Set TR Deq command completes with TRB_ERROR, meaning the command xhci driver queues was faulty.
I'm guessing we somehow mess up the stream ID when xhci driver craetes the TRB.
Sep 3 21:32:58 video udisksd[962]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K6KTRC7F: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Sep 3 21:32:58 video kernel: [11408.247189] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Transfer event with comp_code 3 which is "Babble Detected Error", but the event doesn't point to a pending transfer
Sep 3 21:32:58 video kernel: [11408.247197] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000d9911140 trb-start 00000000d9911150 trb-end 00000000d9911940 seg-start 00000000d9911000 seg-end 00000000d9911ff0
The "Babble Detected Error" event points to transfer at 0000000d9911140,
this is one transfer block before the expected trasnfer 0000000d9911150.
This means the Babble Detected Error was intended for the previous transfers, which xhci driver has
already given back to class driver.
A Babble error will halt the endpoint, but xhci driver doesn't recover the endpoint as
event doesn't map to any transfer. This needs to be fixed.
Thanks
Mathias