Re: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 30, 2020 at 10:03:29PM +0300, Mathias Nyman wrote:
> On 30.6.2020 19.58, Fabian Melzow wrote:
> > Hi!
> > 
> > Am Mon, 29 Jun 2020 20:47:24 +0300
> > schrieb Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>:
> > 
> >> First issue I see is that the attempt to recover from a transaction
> >> error with a soft retry isn't working. We expect the hardware to
> >> retry the transfer but nothing seems to happen. Soft retry is
> >> described in xhci specs 4.6.8.1 and is basically a reset endpoint
> >> command with TSP set, followed by ringing the endpoint doorbell.
> >> Traces indicate driver does this correctly but hardware isn't
> >> retrying. We get don't get any event, no error, success or stall.
> >>
> >> This could be hardware flaw.
> >> Any chance you could try this on a xHC from some other vendor?
> > 
> > There is no other xHC hardware available to me.
> > 
> >> Second issue is a driver flaw, when nothing happened for 20 seconds
> >> we see the URB is canceled. xhci driver needs to stop then endpoint
> >> to cancel the URB, but there is a hw race and endpoint ends up halted
> >> instead of stopped. The xhci driver can't handle a halted endpoint in
> >> its stop endpoint handler properly, and the URB is never actually
> >> removed from the ring.
> >>
> >> The reason you see the IO_PAGE_FAULT is probably because once the
> >> ring starts running the driver will handle the cancelled URB, and
> >> touch already freed memory: AMD-Vi: Event logged [IO_PAGE_FAULT
> >> domain=0x000d address=0xdc707028 flags=0x0020]
> >>
> >> I have a patch for this second case, I haven't upstreamed it as it
> >> got some conflicting feedback earlier. It won't solve the 20 second
> >> delay, but should solve the the IO_PAGE_FAULT and the "WARN Set TR
> >> Deq Ptr cmd failed due to incorrect slot or ep state" message
> >>
> >> Can you try it out?
> > 
> > I successful applied the patch against Linux 5.7.4, but get this error when
> > compiling drivers/usb/host/xhci-ring.c:
> > 
> >   CC [M]  drivers/usb/host/xhci-ring.o
> > drivers/usb/host/xhci-ring.c: In function ‘xhci_handle_cmd_stop_ep’:
> > drivers/usb/host/xhci-ring.c:857:3: error: implicit declaration of function ‘xhci_reset_halted_ep’ [-Werror=implicit-function-declaration]
> >   857 |   xhci_reset_halted_ep(xhci, slot_id, ep_index, reset_type);
> >       |   ^~~~~~~~~~~~~~~~~~~~
> > 
> 
> Right, forgot that you need another patch before this.
> 
> both patches attached, also applied to 5.8-rc1 in branch "fix_invalid_context_at_stop_endpoint"
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git fix_invalid_context_at_stop_endpoint
> 
> -Mathias
> 

Hi,

I've been investigating what appears to be an identical problem recently which I think is related
to the description from xhci 4.6.9. I have an fx2 device which stalls its endpoint while at the
same time the URBs are aborted from the software.

Kernel logs which occured when the xhci driver is stucked at the halted endpoint.

Jun 29 17:27:12 dhe-pc kernel: xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
Jun 29 17:27:12 dhe-pc kernel: xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
Jun 29 17:27:12 dhe-pc kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Jun 29 17:27:21 dhe-pc kernel: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 6
Jun 29 17:27:21 dhe-pc kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 00000001173dc630 trb-start 00000001173dcc30 trb-end 0000000000000000 seg-start 00000001173dc000 seg-end 00000001173dcff0
Jun 29 17:27:21 dhe-pc kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 00000001173dc630 trb-start 0000000116e53000 trb-end 0000000116e53830 seg-start 0000000116e53000 seg-end 0000000116e53ff0

I have applied your patch to kernel version 5.8-rc3 and I can confirm that the patch solves
my problem for the halted endpoint.

I still get the following error messages from time to time.

Jul 01 09:56:56 dhe kernel: xhci_hcd 0000:00:14.0: WARN Event TRB for slot 11 ep 4 with no TDs queued?
Jul 01 09:56:56 dhe kernel: xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
Jul 01 09:56:56 dhe kernel: xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

or

Jul 01 10:09:27 dhe kernel: xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
Jul 01 10:09:27 dhe kernel: xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
Jul 01 10:09:29 dhe kernel: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 13
Jul 01 10:09:29 dhe kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 0000000473d66d90 trb-start 0000000473d66e70 trb-end 0000000000000000 seg-start 0000000473d66000 seg-end 0000000473d66ff0
Jul 01 10:09:29 dhe kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 0000000473d66d90 trb-start 0000000473d65000 trb-end 0000000473d65a70 seg-start 0000000473d65000 seg-end 0000000473d65ff0
Jul 01 10:09:29 dhe kernel: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 4 comp_code 13
Jul 01 10:09:29 dhe kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 0000000473d66e60 trb-start 0000000473d66e70 trb-end 0000000000000000 seg-start 0000000473d66000 seg-end 0000000473d66ff0
Jul 01 10:09:29 dhe kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 0000000473d66e60 trb-start 0000000473d65000 trb-end 0000000473d65a70 seg-start 0000000473d65000 seg-end 0000000473d65ff0

but the core problem that the xhci is stucked at the halted endpoint state does not occur anymore.
If you are interested in more detailed debug logs I can provide them.

David




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux