Re: [PATCH 0/3 v4] ISOC supporting for USB 3.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 29, 2010 at 02:51:50AM +0800, Andiry Xu wrote:
> On Tue, 2010-04-27 at 14:59 -0700, Sarah Sharp wrote:
> > What the log shows is that the hardware gave back a transfer completion
> > with a DMA pointer of all zeros (the suspect DMA = 0x0 line).  I can't
> > tell what the completion code was, only that the endpoint ring that was
> > indexed in the event had at least one TD on the ring.
> > 
> > After the transfer event with the zeroed DMA pointer, the host attempted
> > to give back several TDs.  It looks like those events are for the same
> > isoc ring, after the TD at the head of the endpoint list.  The ring
> > handler doesn't complete the queued TD when it receives an event that
> > looks erroneous, so the endpoint gets stuck at one TD.
> > 
> >From the log, that's possibly what happens.
> 
> > 
> > Now, I see there's this note in the xHCI spec:
> > 
> > "Note: The TRB Pointer field of Missed Service Error Transfer Event may
> > be ‘0’."
> > 
> > Section 4.10.3.2 talks about what happens when the host can't transfer
> > the data within the requested service interval, and it says that it may
> > give a transfer event completion code of "Missed Service Error" with a
> > DMA pointer of zero.  I wonder if that's what I'm seeing?
> > 
> Probably. Ring Overrun and Underrun event also have DMA pointer of zero.
> But it should be processed within the patch.

Not necessarily, see below.

> > The DMA pointers in the transfer completion events do look a bit weird
> > even with that theory:
> > 
> > Apr 27 13:42:45 xanatos kernel: [12080.477127] trb_in_td - suspect DMA = 0x0
> > Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890
> > Apr 27 13:42:45 xanatos kernel: [12080.477291] trb_in_td - suspect DMA = 0x3cd9d890
> > Apr 27 13:42:45 xanatos kernel: [12080.477411] trb_in_td - suspect DMA = 0x3cd9d8a0
> > Apr 27 13:42:45 xanatos kernel: [12080.477537] trb_in_td - suspect DMA = 0x3cd9d8b0
> > 
> > Why would it giveback 0x3cd9d890 twice?
> 
> I only see 0x3cd9d890 in the full log for once. Please note the time
> stamp - both 12080.477291. Maybe you just copy it for twice?

Probably. :)

> > I'm also wondering about this line:
> > 
> > Apr 27 13:42:09 xanatos kernel: [12044.124863] xhci_hcd 0000:05:00.0: WARN Event TRB for slot 1 ep 2 with no TDs queued?
> > 
> This line is strange to be here. A missed service error indicates the
> xHC was unable to complete the data transfer associated with an Isoch
> TRB within the ESIT. but if an endpoint has no tds queued, what does the
> xHC has to process? 

If the driver got stuck on one isoc TD, and the host processed all the
TDs after that one, then there will still be TDs in the endpoint's TD
list.  So this code will never run:


	if (list_empty(&ep_ring->td_list)) {
		/* When the Isoch ring is empty, the xHC will generate
		 * a Ring Overrun Event for IN Isoch endpoint or Ring
		 * Underrun Event for OUT Isoch endpoint.
		 */
		switch (trb_comp_code) {
		case COMP_UNDERRUN:
			xhci_dbg(xhci, "underrun event on endpoint\n");
			urb = NULL;
			goto cleanup;
		case COMP_OVERRUN:
			xhci_dbg(xhci, "overrun event on endpoint\n");
			urb = NULL;
			goto cleanup;
		default:
			break;
		}
		xhci_warn(xhci, "WARN Event TRB for slot %d ep %d "
				"with no TDs queued?\n",
				TRB_TO_SLOT_ID(event->flags), ep_index);
		xhci_warn(xhci, "Event TRB with TRB type ID %u\n",
				(unsigned int)
				(event->flags & TRB_TYPE_BITMASK)>>10);
		xhci_print_trb_offsets(xhci, (union xhci_trb *) event);
		urb = NULL;
		goto cleanup;
	}

You may want to move the checking for COMP_UNDERRUN and COMP_OVERRUN
outside that if statement.

> > If this is related to the missed service error, then you need to add
> > checking for that completion code before attempting to find the endpoint
> > segment that the event came from.  So there would need to be some
> > completion code checking before this chunk of code:
> > 
> > 	/* Is this a TRB in the currently executing TD? */
> > 	event_seg = trb_in_td(ep_ring->deq_seg, ep_ring->dequeue,
> > 			td->last_trb, event_dma, false);
> > 	if (!event_seg) {
> > 		/* HC is busted, give up! */
> > 		xhci_err(xhci, "ERROR Transfer event TRB DMA ptr not part of current TD\n");
> > 
> > The xHCI driver should just giveback the first TD in the list when it
> > sees that completion code, since the host is going to advance to the
> > next TD.  I'm not sure what completion code you should give the URB.
> > Maybe -EREMOTEIO for a short transfer and set the transferred_length to
> > zero?
> > 
> In this case, it should be treated as a short transfer and the
> actual_length should be set to 0. But only to the corresponding packet,
> not the whole URB.

Yes, true.

> I have been trying different USB isoc devices these days, and finally I
> seem to be able to reproduce the issue with Hauppauge WinTV-HVR950,
> which is a USB2.0 TV tuner. I used tvtime to play TV. It works fine, but
> sometimes when I quit the application, it shows "ERROR Transfer event
> TRB DMA ptr not part of current TD" message. Sorry the log file is too
> big and I only show the interesting part.
> 
> Here is the suspicious part I catch:   
> 
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388580] xhci_hcd 0000:02:00.0: Miss service interval error
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388582] xhci_hcd 0000:02:00.0: event = ffff88002177b610
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388583] xhci_hcd 0000:02:00.0: td->last_trb = ffff880021cb2420
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388586] xhci_hcd 0000:02:00.0: Event ring:
> ......
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388668] xhci_hcd 0000:02:00.0: @2177b5f0 21cb2230 00000000 0d0003c4 01058000
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388670] xhci_hcd 0000:02:00.0: @2177b600 2177b1c0 00000000 01000000 01008400
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388673] xhci_hcd 0000:02:00.0: @2177b610 00000000 00000000 17000000 01058000
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388676] xhci_hcd 0000:02:00.0: @2177b620 21cb2440 00000000 01000000 01058000
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388679] xhci_hcd 0000:02:00.0: @2177b630 21cb2450 00000000 01000000 01058000
> 
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.388753] xhci_hcd 0000:02:00.0: Endpoint ring:
> ......
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389345] xhci_hcd 0000:02:00.0: @21cb2230 1a707c44 00000000 00040b4c 80001425
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389348] xhci_hcd 0000:02:00.0: @21cb2240 00000000 00000000 00000000 00002001
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389351] xhci_hcd 0000:02:00.0: @21cb2250 00000000 00000000 00000000 00002001
> ......
> (There are all no-op trbs here. Seems cancelled tds)
> ......
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389419] xhci_hcd 0000:02:00.0: @21cb2400 00000000 00000000 00000000 00002001
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389422] xhci_hcd 0000:02:00.0: @21cb2410 00000000 00000000 00000000 00002001
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389425] xhci_hcd 0000:02:00.0: @21cb2420 1a720000 00000000 00040b4c 80001425
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389428] xhci_hcd 0000:02:00.0: @21cb2430 1a720b4c 00000000 00040b4c 80001425
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389430] xhci_hcd 0000:02:00.0: @21cb2440 1a721698 00000000 00040b4c 80001425
> Apr 29 01:44:23 brienza-laptop kernel: [ 1739.389433] xhci_hcd 0000:02:00.0: @21cb2450 1a7221e4 00000000 00040b4c 80001425
> 
> Here is what happen: td @21cb2230 is processed OK and there is a
> corresponding event @2177b5f0. Ignore event @2177b6f0, it's for control
> endpoint. Now there're no-op TRBs on the endpoint ring, probably
> cancelled TDs. event @2177b610 is a missed service error event. maybe
> it's due to the no-op TRBs on the ring?

Yes, it could be that the xHC couldn't process the no-op TRBs fast
enough to get to the next isoc TD at @21cb2420.

> When the xHC continues to process, you can see two valid isoc tds
> @21cb2420 and 21cb2430 have no corresponding events in the event ring.
> The next event is for td @21cb2440. The td @21cb2420 is stucked and many
> "ERROR Transfer event TRB DMA ptr not part of current TD" messages burst
> out.
> 
> Now that's the trick part. When submit one missed service error event,
> the xHC actually missed two isoc tds. (According to the spec, in this
> case there should be two missed service error events here - correct me
> if I'm wrong.)

I'll ask Steve about that.  It's not clear to me from the spec.  In
fact, section 4.11.2.3 says:

"The xHC may not generate a Missed Service Error for each Isochronous
deadline missed, e.g. if the Event Ring is full."

I would read that to mean that the xHC can skip arbitrary numbers of
isoc TDs and only generate one Missed Service Error, but I'll ask Steve
for clarification.

> Even if advance to next td when encounter missed service
> error event, in this case it will go to td @21cb2430 - unfortunately
> still stucked. I'll do more tests to see if this is always what happens
> and what should do about this. You can check your log and see if this is
> what happening at your side.

Hmm, maybe when we get a missed service event, we need to set a flag in
the endpoint.  On the next transfer event with a non-zero DMA pointer,
we search for the TD that corresponds to the DMA pointer, and mark all
isoc buffers that are before that TD with a short packet status (and
giveback any URBs that we complete in the process).

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux