Re: usb HC busted?

Sudip Mukherjee <sudipm.mukherjee@xxxxxxxxx> · Sun, 3 Jun 2018 20:37:10 +0100

Hi Mathias,

On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote:
> Hi
> 
> On 24.05.2018 00:29, Sudip Mukherjee wrote:
> >Hi Mathias,
> >
> >>>On Fri, May 18, 2018 at 03:55:04PM +0300, Mathias Nyman wrote:
> >>>>Hi,
<snip>
> >>>>
> >>>>
> >>>>Can you enable tracing for xhci and send me the output.
> >>>
> >We have finally reproduced the error while traces were on. The trace and
> >the relevant part of the dmesg (when the error starts) are in:
> >https://drive.google.com/open?id=1PbcYwL1a9ndtHw1MNjE6uVqb0fFX9jV8
> >
> >Will request you to have a look and suggest what might be going wrong here.
> >
> 
> Log show two rings having the same TRB segment dma address, this will completely mess up the transfer:
> 
> While allocating rigs the enque pointers for the two rings are the same:
> 
> 461.859315: xhci_ring_alloc: ISOC efa4e580: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ...bs
> 461.859320: xhci_ring_alloc: ISOC f0ce1f00: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ...
> 
> URBs for ISOC IN transfers are queued on EP3 at enqueue address (33386000 to 33386140)
> 
> 461.859998: xhci_urb_enqueue: ep3in-isoc: urb f0ec0e00 pipe 4294528 slot 8 length 0/170 sgs 0/0 stream 0 flags 00010302
> 461.860004: xhci_queue_trb: ISOC: Buffer 000000002b480240 length 17 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:I:e:c
> 461.860006: xhci_inc_enq: ISOC f0ce1f00: enq 0x0000000033386010(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000
> 
> Later URBs for ISOC OUT transfers are queued at the same address, this should not happen:
> 
> 461.901175: xhci_urb_enqueue: ep3out-isoc: urb ecec2600 pipe 100096 slot 8 length 0/51 sgs 0/0 stream 0 flags 00010002
> 461.901180: xhci_queue_trb: ISOC: Buffer 000000002d9fa805 length 17 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:i:e:c
> 461.901181: xhci_inc_enq: ISOC efa4e580: enq 0x0000000033386010(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000)
> 
> So something goes really wrong when allocating or setting up the rings in one of these functions:
> xhci_ring_alloc()
> xhci_alloc_segments_for_ring()
> xhci_initialize_ring_info()
> xhci_segment_alloc()
> xhci_link_segments()
> dma_pool_zalloc()
> 
> To verify and rule out dma_pool_zalloc(), could you apply the attached patch and reproduce with new logs?

We tested for the full week but still could not reproduce with the patch
applied. We are still trying and will be setting up automated tests for
this. And, since we are not able to reproduce it, I was wondering if it
is somekind of race and the applied patch with extra tracing has changed
the timing in such a way that it is not seen now. And also, wondering if
2b3ff282dff3 ("xhci: Don't add a virt_dev to the devs array before it's fully allocated")
will be of any help to us.

--
Regards
Sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html