Re: [PATCH 0/3 v4] ISOC supporting for USB 3.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 23, 2010 at 06:36:04PM +0800, Andiry Xu wrote:
> On Thu, 2010-04-22 at 12:43 -0700, Sarah Sharp wrote:
> > On Tue, Apr 20, 2010 at 05:54:41PM +0800, Libin Yang wrote:
> > > This is the version 4 for ISOC patches. 
> > > 
> > > The difference from v3 is to add a transfer ring check in 
> > > xhci_queue_isoc_tx_prepare() before inserting any tds of the urb to the ring.
> > 
> > I tested this on my FS webcam that was causing a scheduling while atomic
> > oops on the last version of the patch.  With the current 3 isochronous
> > patches (but without your patch to allocate bigger ring for isochronous
> > endpoints), the driver runs out of room on the endpoint ring (expected)
> > and does not oops the system.  I see the isochronous transfers
> > successfully canceled, so you've fixed that bug. :)
> > 
> > With your patch to enlarge the isoc transfer ring, I successfully get
> > video out of the FS webcam!
> > 
> > Unfortunately, I tried it with my HS webcam, and that does not work.
> > The log file gets garbled with the amount of messages coming from the
> > driver, but I see a lot of "ERROR Transfer event TRB DMA ptr not part of
> > current TD" messages.  It seems like the driver and the host controller
> > got out of sync somewhere, and the host was giving back transfers for
> > isoc TDs, and the driver was skipping over them.  But the log file where
> > the transfers were enqueued was garbled, so I can't see exactly what's
> > happening.
> > 
> > I tried applying a patch to turn off a lot of the debugging, and then
> > the webcam worked, so I'm not sure if it's an intermittent failure.  I
> > need to test more.  In the mean time, the log file I have from the high
> > speed webcam failure is at
> > 
> > http://minilop.net/~sarah/isoc-hs-webcam-v4-patches.tar.gz
> > 
> > Maybe you can see the problem quicker than I?
> > 
> 
> Hmm... I'm curious about this. Can you reproduce this issue every time
> with debug message on? 
> 
> Can you add some debug message to print out the following information
> when the error happens?
> The address of ep_ring->dequeue, td->last_trb, event and event_dma, and
> call xhci_debug_ring(xhci, xhci->event_ring) and xhci_debug_ring(xhci,
> ep_ring).

I caught the bug again, but unfortunately, I had xHCI debugging turned
off at the time and I only got the warning/error level messages.  I
turned off the len=0 bug fix I had added to get your compiler warning to
go away, so I'm not sure if the error is related to the bug you need to
fix in your next patchset.

Here's a snippet of the output I captured:

Apr 27 13:42:07 xanatos kernel: [12041.969986] usb 1-3: Installing alternate setting 11
Apr 27 13:42:07 xanatos kernel: [12041.970221] usb 1-3: ep 0x81 - rounding interval to 1 microframes
Apr 27 13:42:09 xanatos kernel: [12044.124762] usb 1-3: Installing alternate setting 0
Apr 27 13:42:09 xanatos kernel: [12044.124863] xhci_hcd 0000:05:00.0: WARN Event TRB for slot 1 ep 2 with no TDs queued?
Apr 27 13:42:09 xanatos kernel: [12044.124868] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32
Apr 27 13:42:09 xanatos kernel: [12044.125804] usb 1-3: Installing alternate setting 0
Apr 27 13:42:11 xanatos kernel: [12046.455090] usb 1-3: Installing alternate setting 5
Apr 27 13:42:11 xanatos kernel: [12046.455340] usb 1-3: ep 0x81 - rounding interval to 1 microframes
Apr 27 13:42:16 xanatos kernel: [12051.595984] xhci_queue_intr_tx: 6 callbacks suppressed
Apr 27 13:42:37 xanatos kernel: [12072.581713] usb 1-3: Installing alternate setting 0
Apr 27 13:42:37 xanatos kernel: [12072.667674] usb 1-3: Installing alternate setting 0
Apr 27 13:42:45 xanatos kernel: [12080.229229] usb 1-3: Installing alternate setting 11
Apr 27 13:42:45 xanatos kernel: [12080.229479] usb 1-3: ep 0x81 - rounding interval to 1 microframes

Apr 27 13:42:45 xanatos kernel: [12080.477101] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
Apr 27 13:42:45 xanatos kernel: [12080.477114] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32
Apr 27 13:42:45 xanatos kernel: [12080.477121] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870
Apr 27 13:42:45 xanatos kernel: [12080.477127] trb_in_td - suspect DMA = 0x0
Apr 27 13:42:45 xanatos kernel: [12080.477134] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0

Apr 27 13:42:45 xanatos kernel: [12080.477269] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
Apr 27 13:42:45 xanatos kernel: [12080.477279] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32
Apr 27 13:42:45 xanatos kernel: [12080.477285] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870
Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890
Apr 27 13:42:45 xanatos kernel: [12080.477298] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0

Apr 27 13:42:45 xanatos kernel: [12080.477269] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
Apr 27 13:42:45 xanatos kernel: [12080.477279] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32
Apr 27 13:42:45 xanatos kernel: [12080.477285] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870
Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890
Apr 27 13:42:45 xanatos kernel: [12080.477298] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0

Apr 27 13:42:45 xanatos kernel: [12080.477392] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
Apr 27 13:42:45 xanatos kernel: [12080.477399] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32
Apr 27 13:42:45 xanatos kernel: [12080.477405] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870
Apr 27 13:42:45 xanatos kernel: [12080.477411] trb_in_td - suspect DMA = 0x3cd9d8a0
Apr 27 13:42:45 xanatos kernel: [12080.477418] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0

Apr 27 13:42:45 xanatos kernel: [12080.477518] xhci_hcd 0000:05:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
Apr 27 13:42:45 xanatos kernel: [12080.477525] xhci_hcd 0000:05:00.0: Event TRB with TRB type ID 32
Apr 27 13:42:45 xanatos kernel: [12080.477531] xhci_hcd 0000:05:00.0: Current td->first_trb (DMA) = 0x3cd9d870
Apr 27 13:42:45 xanatos kernel: [12080.477537] trb_in_td - suspect DMA = 0x3cd9d8b0
Apr 27 13:42:45 xanatos kernel: [12080.477543] trb_in_td - start DMA = 0x3cd9d870, end DMA = 0x3cd9d870,seg end DMA = 0x3cd9dbf0

The full log is here:

http://minilop.net/~sarah/logs/isoc-and-pm-2010-04-27-13-38.txt

What the log shows is that the hardware gave back a transfer completion
with a DMA pointer of all zeros (the suspect DMA = 0x0 line).  I can't
tell what the completion code was, only that the endpoint ring that was
indexed in the event had at least one TD on the ring.

After the transfer event with the zeroed DMA pointer, the host attempted
to give back several TDs.  It looks like those events are for the same
isoc ring, after the TD at the head of the endpoint list.  The ring
handler doesn't complete the queued TD when it receives an event that
looks erroneous, so the endpoint gets stuck at one TD.


Now, I see there's this note in the xHCI spec:

"Note: The TRB Pointer field of Missed Service Error Transfer Event may
be ‘0’."

Section 4.10.3.2 talks about what happens when the host can't transfer
the data within the requested service interval, and it says that it may
give a transfer event completion code of "Missed Service Error" with a
DMA pointer of zero.  I wonder if that's what I'm seeing?

The DMA pointers in the transfer completion events do look a bit weird
even with that theory:

Apr 27 13:42:45 xanatos kernel: [12080.477127] trb_in_td - suspect DMA = 0x0
Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890
Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890
Apr 27 13:42:45 xanatos kernel: [12080.477411] trb_in_td - suspect DMA = 0x3cd9d8a0
Apr 27 13:42:45 xanatos kernel: [12080.477537] trb_in_td - suspect DMA = 0x3cd9d8b0

Why would it giveback 0x3cd9d890 twice?

I'm also wondering about this line:

Apr 27 13:42:09 xanatos kernel: [12044.124863] xhci_hcd 0000:05:00.0: WARN Event TRB for slot 1 ep 2 with no TDs queued?

If this is related to the missed service error, then you need to add
checking for that completion code before attempting to find the endpoint
segment that the event came from.  So there would need to be some
completion code checking before this chunk of code:

	/* Is this a TRB in the currently executing TD? */
	event_seg = trb_in_td(ep_ring->deq_seg, ep_ring->dequeue,
			td->last_trb, event_dma, false);
	if (!event_seg) {
		/* HC is busted, give up! */
		xhci_err(xhci, "ERROR Transfer event TRB DMA ptr not part of current TD\n");

The xHCI driver should just giveback the first TD in the list when it
sees that completion code, since the host is going to advance to the
next TD.  I'm not sure what completion code you should give the URB.
Maybe -EREMOTEIO for a short transfer and set the transferred_length to
zero?

I'll try to reproduce this with full debugging turned on, but if it is a
missed service error, then it's probably subject to hardware/bus race
conditions and may not be easy to trigger.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux