RE: Question about HALT bit set of qTD in EHCI driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 15 Mar 2011, Xu, Andiry wrote:

> > -----Original Message-----
> > From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx]
> > Sent: Monday, March 14, 2011 11:21 PM
> > To: David Brownell; Xu, Andiry
> > Cc: USB list
> > Subject: RE: Question about HALT bit set of qTD in EHCI driver
> > 
> > Dave:
> > 
> > Can you help with this issue?  Do you remember why the code below in
> > qh_completions() sets the HALT bit in the QH overlay region?  The
> > comment doesn't seem to be relevant any more.
> > 
> > 			/* force halt for unlinked or blocked qh, so
> we'll
> > 			 * patch the qh later and so that completions
> can't
> > 			 * activate it while we "know" it's stopped.
> > 			 */
> > 			if ((halt & hw->hw_token) == 0) {
> > halt:
> > 				hw->hw_token |= halt;
> > 				wmb ();
> > 			}
> > 
> > Can we eliminate these statements?  The fact that HALT is set in
> > hw_token won't prevent the driver from doing anything: Contrary to
> > what the comment says, whether or not the QH gets "patched" or
> > reactivated has nothing to do with the HALT bit.
> > 
> > In fact, there are circumstances in which the QH gets reactivated
> > _without_ being updated (i.e., when an URB is unlinked before the
> > hardware starts executing it.  After the URB is given back the queue
> > picks up right where it left off, so qh_refresh() doesn't call
> > qh_update()).  In this case setting the HALT bit would be a definite
> > mistake!
> > 
> > In fact, maybe that mistake is the reason for Andiry's problem...
> > 
> > 
> > On Mon, 14 Mar 2011, Xu, Andiry wrote:
> > 
> > > > Furthermore, these statements get executed in only three ways:
> > > >
> > > > 	The QH is unlinked.  In this case the hardware should never
> > > > 	see the QH so it shouldn't be confused by it.
> > > >
> > > > 	The QH is halted because an earlier qTD terminated with an
> > > > 	error.  In this case the Active bit should already be turned
> > > > 	off, so turning on the Halt bit shouldn't hurt.
> > > >
> > > > 	A short read occurred.  Is this the case that's bothering you?
> > > >
> > > > Now, I have to admit that I don't remember why this code sets the
> Halt
> > > > bit.  Maybe it doesn't have to be done at all; I'm not sure.
> > 
> > > The qh state is QH_STATE_IDLE when the driver set the Halt bit. No
> error
> > > or short read occurs. In this case HW should not see this qh, right?
> > 
> > Right.  QH_STATE_IDLE means the QH is no longer on the async list
> _and_
> > an IAA interrupt has occurred since it was removed.
> > 
> > If there was no error and no short read, then this statement must have
> > been executed because an URB was unlinked.  Is that what happened?
> > 
> 
> Yes. The URB is unlinked.
> 
> > > I've verified clear either the Active bit or the Halt bit in this
> case
> > > can workaround this issue.
> > >
> > > I've found that when disable async qh/qtd cache (a HW feature on our
> > > platform), the issue disappears. Seems the HW caches the idle qh and
> be
> > > able to see it.
> > 
> > That really sounds like a bug in the hardware design.  The HW should
> > _never_ cache QHs that have been fully removed from the async list.
> > In fact, that's exactly what the IAA interrupt is supposed to mean --
> > it means that any QHs removed from the async list before the IAAD bit
> > was set are no longer present in the HW cache.  See section 4.8.2 in
> > the EHCI spec.
> > 
> 
> That's only my assumption. At this time we're still looking for root
> cause and can not say it's a HW bug. The HW does fetch a qh overlay with
> Active and Halt bit set and got confused when trying to process it, we
> want to figure out why it's exposed to HW. 
> 
> > > Can I propose a workaround, do not set halt bit if it's an affected
> > > platform and Active bit is set in this case?
> > 
> > We may be able to avoid setting the halt bit entirely.  Let's see what
> > Dave says.
> > 
> 
> OK, thanks.

As it turns out, I was able to duplicate your result by creating a new
test case for the usbtest driver.  The new test caused ehci-hcd to hang
on my Intel ICH5 motherboard, but after I removed the code that sets
the Halt bit, everything worked correctly.

The new usbtest case (test 17) submits N bulk-OUT URBs, then unlinks
the URBs in position N-4 and N-2, and then waits for all the URBs to
complete.  Here is a usbmon trace of the test using the current
ehci-hcd driver; I set N to 20 and the URB transfer lengths to 4096:

e9f84840 2316090736 S Co:1:002:0 s 01 0b 0000 0000 0000 0
e9f84840 2316091362 C Co:1:002:0 0 0
e9f84840 2316091766 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84f20 2316091839 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84dc0 2316091892 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84c60 2316091944 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84a50 2316092033 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f848f0 2316092089 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84840 2316092105 C Bo:1:002:2 0 4096 >
e9f849a0 2316092218 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84f20 2316092233 C Bo:1:002:2 0 4096 >
ed74dbb0 2316092344 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84dc0 2316092359 C Bo:1:002:2 0 4096 >
ed74dc60 2316092468 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84c60 2316092481 C Bo:1:002:2 0 4096 >
ed74dd10 2316092559 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f84a50 2316092612 C Bo:1:002:2 0 4096 >
ed74d420 2316092685 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f848f0 2316092753 C Bo:1:002:2 0 4096 >
ed74d2c0 2316092813 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ed74d160 2316092894 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f849a0 2316092981 C Bo:1:002:2 0 4096 >
ed74dbb0 2316093103 C Bo:1:002:2 0 4096 >
ed74d370 2316093196 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ed74dc60 2316093228 C Bo:1:002:2 0 4096 >
ed74db00 2316093293 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ed74dd10 2316093354 C Bo:1:002:2 0 4096 >
eca498f0 2316093420 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ed74d420 2316093487 C Bo:1:002:2 0 4096 >
eca49840 2316093549 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ed74d2c0 2316093629 C Bo:1:002:2 0 4096 >
eca499a0 2316093677 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
eca49790 2316093730 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
eca49a50 2316093815 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ed74d160 2316093853 C Bo:1:002:2 0 4096 >
ed74d370 2316093980 C Bo:1:002:2 0 4096 >
eca49840 2316093988 C Bo:1:002:2 -104 0
eca49790 2316093991 C Bo:1:002:2 -104 0

As you can see, all 20 URBs were submitted.  14 of them completed 
before the unlinks occurred, but then the remaining 4 did not complete.  
The testusb program hung.

Here's the same test using a version of ehci-hcd in which the Halt bit 
doesn't get set:

e9f849a0 2246902336 S Co:2:002:0 s 01 0b 0000 0000 0000 0
e9f849a0 2246903111 C Co:2:002:0 0 0
e9f849a0 2246903530 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f848f0 2246903608 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485000 2246903665 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee4850b0 2246903717 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485160 2246903803 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f849a0 2246903858 C Bo:2:002:2 0 4096 >
ee485210 2246903911 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
e9f848f0 2246903978 C Bo:2:002:2 0 4096 >
ee4852c0 2246904045 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485000 2246904113 C Bo:2:002:2 0 4096 >
ee485370 2246904178 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee4850b0 2246904253 C Bo:2:002:2 0 4096 >
ee485420 2246904310 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee4854d0 2246904391 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485580 2246904450 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485160 2246904478 C Bo:2:002:2 0 4096 >
ee485630 2246904578 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485210 2246904602 C Bo:2:002:2 0 4096 >
ee4856e0 2246904707 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee4852c0 2246904727 C Bo:2:002:2 0 4096 >
ee485790 2246904830 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485370 2246904854 C Bo:2:002:2 0 4096 >
ee485840 2246904928 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485420 2246904989 C Bo:2:002:2 0 4096 >
ee4858f0 2246905058 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee4854d0 2246905128 C Bo:2:002:2 0 4096 >
ee4859a0 2246905189 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485a50 2246905272 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485b00 2246905330 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485580 2246905352 C Bo:2:002:2 0 4096 >
ee485bb0 2246905459 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ee485630 2246905477 C Bo:2:002:2 0 4096 >
ee4856e0 2246905728 C Bo:2:002:2 0 4096 >
ee4859a0 2246905737 C Bo:2:002:2 -104 0
ee485b00 2246905740 C Bo:2:002:2 -104 0
ee485790 2246905868 C Bo:2:002:2 0 4096 >
ee485840 2246906102 C Bo:2:002:2 0 4096 >
ee4858f0 2246906228 C Bo:2:002:2 0 4096 >
ee485a50 2246906352 C Bo:2:002:2 0 4096 >
ee485bb0 2246906479 C Bo:2:002:2 0 4096 >

Here 13 of the URBs completed before the unlinks occurred, and
afterward the remaining 5 completed normally.

Below is the patch I used for ehci-hcd.  If you confirm that it fixes 
your problem then I will submit it Greg, along with the new usbtest 
code.

Alan Stern



 drivers/usb/host/ehci-q.c |   12 ------------
 1 file changed, 12 deletions(-)

Index: usb-2.6/drivers/usb/host/ehci-q.c
===================================================================
--- usb-2.6.orig/drivers/usb/host/ehci-q.c
+++ usb-2.6/drivers/usb/host/ehci-q.c
@@ -315,7 +315,6 @@ qh_completions (struct ehci_hcd *ehci, s
 	int			stopped;
 	unsigned		count = 0;
 	u8			state;
-	const __le32		halt = HALT_BIT(ehci);
 	struct ehci_qh_hw	*hw = qh->hw;
 
 	if (unlikely (list_empty (&qh->qtd_list)))
@@ -422,7 +421,6 @@ qh_completions (struct ehci_hcd *ehci, s
 					&& !(qtd->hw_alt_next
 						& EHCI_LIST_END(ehci))) {
 				stopped = 1;
-				goto halt;
 			}
 
 		/* stop scanning when we reach qtds the hc is using */
@@ -456,16 +454,6 @@ qh_completions (struct ehci_hcd *ehci, s
 				 */
 				ehci_clear_tt_buffer(ehci, qh, urb, token);
 			}
-
-			/* force halt for unlinked or blocked qh, so we'll
-			 * patch the qh later and so that completions can't
-			 * activate it while we "know" it's stopped.
-			 */
-			if ((halt & hw->hw_token) == 0) {
-halt:
-				hw->hw_token |= halt;
-				wmb ();
-			}
 		}
 
 		/* unless we already know the urb's status, collect qtd status

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux