RE: Question about HALT bit set of qTD in EHCI driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx]
> Sent: Wednesday, March 16, 2011 3:50 AM
> To: Xu, Andiry
> Cc: David Brownell; USB list
> Subject: RE: Question about HALT bit set of qTD in EHCI driver
> 
> On Tue, 15 Mar 2011, Xu, Andiry wrote:
> 
> > > -----Original Message-----
> > > From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx]
> > > Sent: Monday, March 14, 2011 11:21 PM
> > > To: David Brownell; Xu, Andiry
> > > Cc: USB list
> > > Subject: RE: Question about HALT bit set of qTD in EHCI driver
> > >
> > > Dave:
> > >
> > > Can you help with this issue?  Do you remember why the code below
in
> > > qh_completions() sets the HALT bit in the QH overlay region?  The
> > > comment doesn't seem to be relevant any more.
> > >
> > > 			/* force halt for unlinked or blocked qh, so
> > we'll
> > > 			 * patch the qh later and so that completions
> > can't
> > > 			 * activate it while we "know" it's stopped.
> > > 			 */
> > > 			if ((halt & hw->hw_token) == 0) {
> > > halt:
> > > 				hw->hw_token |= halt;
> > > 				wmb ();
> > > 			}
> > >
> > > Can we eliminate these statements?  The fact that HALT is set in
> > > hw_token won't prevent the driver from doing anything: Contrary to
> > > what the comment says, whether or not the QH gets "patched" or
> > > reactivated has nothing to do with the HALT bit.
> > >
> > > In fact, there are circumstances in which the QH gets reactivated
> > > _without_ being updated (i.e., when an URB is unlinked before the
> > > hardware starts executing it.  After the URB is given back the
queue
> > > picks up right where it left off, so qh_refresh() doesn't call
> > > qh_update()).  In this case setting the HALT bit would be a
definite
> > > mistake!
> > >
> > > In fact, maybe that mistake is the reason for Andiry's problem...
> > >
> > >
> > > On Mon, 14 Mar 2011, Xu, Andiry wrote:
> > >
> > > > > Furthermore, these statements get executed in only three ways:
> > > > >
> > > > > 	The QH is unlinked.  In this case the hardware should
never
> > > > > 	see the QH so it shouldn't be confused by it.
> > > > >
> > > > > 	The QH is halted because an earlier qTD terminated with
an
> > > > > 	error.  In this case the Active bit should already be
turned
> > > > > 	off, so turning on the Halt bit shouldn't hurt.
> > > > >
> > > > > 	A short read occurred.  Is this the case that's
bothering you?
> > > > >
> > > > > Now, I have to admit that I don't remember why this code sets
the
> > Halt
> > > > > bit.  Maybe it doesn't have to be done at all; I'm not sure.
> > >
> > > > The qh state is QH_STATE_IDLE when the driver set the Halt bit.
No
> > error
> > > > or short read occurs. In this case HW should not see this qh,
right?
> > >
> > > Right.  QH_STATE_IDLE means the QH is no longer on the async list
> > _and_
> > > an IAA interrupt has occurred since it was removed.
> > >
> > > If there was no error and no short read, then this statement must
have
> > > been executed because an URB was unlinked.  Is that what happened?
> > >
> >
> > Yes. The URB is unlinked.
> >
> > > > I've verified clear either the Active bit or the Halt bit in
this
> > case
> > > > can workaround this issue.
> > > >
> > > > I've found that when disable async qh/qtd cache (a HW feature on
our
> > > > platform), the issue disappears. Seems the HW caches the idle qh
and
> > be
> > > > able to see it.
> > >
> > > That really sounds like a bug in the hardware design.  The HW
should
> > > _never_ cache QHs that have been fully removed from the async
list.
> > > In fact, that's exactly what the IAA interrupt is supposed to mean
--
> > > it means that any QHs removed from the async list before the IAAD
bit
> > > was set are no longer present in the HW cache.  See section 4.8.2
in
> > > the EHCI spec.
> > >
> >
> > That's only my assumption. At this time we're still looking for root
> > cause and can not say it's a HW bug. The HW does fetch a qh overlay
with
> > Active and Halt bit set and got confused when trying to process it,
we
> > want to figure out why it's exposed to HW.
> >
> > > > Can I propose a workaround, do not set halt bit if it's an
affected
> > > > platform and Active bit is set in this case?
> > >
> > > We may be able to avoid setting the halt bit entirely.  Let's see
what
> > > Dave says.
> > >
> >
> > OK, thanks.
> 
> As it turns out, I was able to duplicate your result by creating a new
> test case for the usbtest driver.  The new test caused ehci-hcd to
hang
> on my Intel ICH5 motherboard, but after I removed the code that sets
> the Halt bit, everything worked correctly.
> 
> The new usbtest case (test 17) submits N bulk-OUT URBs, then unlinks
> the URBs in position N-4 and N-2, and then waits for all the URBs to
> complete.  Here is a usbmon trace of the test using the current
> ehci-hcd driver; I set N to 20 and the URB transfer lengths to 4096:
> 
> e9f84840 2316090736 S Co:1:002:0 s 01 0b 0000 0000 0000 0
> e9f84840 2316091362 C Co:1:002:0 0 0
> e9f84840 2316091766 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84f20 2316091839 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84dc0 2316091892 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84c60 2316091944 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84a50 2316092033 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f848f0 2316092089 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84840 2316092105 C Bo:1:002:2 0 4096 >
> e9f849a0 2316092218 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84f20 2316092233 C Bo:1:002:2 0 4096 >
> ed74dbb0 2316092344 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84dc0 2316092359 C Bo:1:002:2 0 4096 >
> ed74dc60 2316092468 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84c60 2316092481 C Bo:1:002:2 0 4096 >
> ed74dd10 2316092559 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f84a50 2316092612 C Bo:1:002:2 0 4096 >
> ed74d420 2316092685 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f848f0 2316092753 C Bo:1:002:2 0 4096 >
> ed74d2c0 2316092813 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ed74d160 2316092894 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f849a0 2316092981 C Bo:1:002:2 0 4096 >
> ed74dbb0 2316093103 C Bo:1:002:2 0 4096 >
> ed74d370 2316093196 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ed74dc60 2316093228 C Bo:1:002:2 0 4096 >
> ed74db00 2316093293 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ed74dd10 2316093354 C Bo:1:002:2 0 4096 >
> eca498f0 2316093420 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ed74d420 2316093487 C Bo:1:002:2 0 4096 >
> eca49840 2316093549 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ed74d2c0 2316093629 C Bo:1:002:2 0 4096 >
> eca499a0 2316093677 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> eca49790 2316093730 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> eca49a50 2316093815 S Bo:1:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ed74d160 2316093853 C Bo:1:002:2 0 4096 >
> ed74d370 2316093980 C Bo:1:002:2 0 4096 >
> eca49840 2316093988 C Bo:1:002:2 -104 0
> eca49790 2316093991 C Bo:1:002:2 -104 0
> 
> As you can see, all 20 URBs were submitted.  14 of them completed
> before the unlinks occurred, but then the remaining 4 did not
complete.
> The testusb program hung.
> 
> Here's the same test using a version of ehci-hcd in which the Halt bit
> doesn't get set:
> 
> e9f849a0 2246902336 S Co:2:002:0 s 01 0b 0000 0000 0000 0
> e9f849a0 2246903111 C Co:2:002:0 0 0
> e9f849a0 2246903530 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f848f0 2246903608 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485000 2246903665 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee4850b0 2246903717 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485160 2246903803 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f849a0 2246903858 C Bo:2:002:2 0 4096 >
> ee485210 2246903911 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> e9f848f0 2246903978 C Bo:2:002:2 0 4096 >
> ee4852c0 2246904045 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485000 2246904113 C Bo:2:002:2 0 4096 >
> ee485370 2246904178 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee4850b0 2246904253 C Bo:2:002:2 0 4096 >
> ee485420 2246904310 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee4854d0 2246904391 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485580 2246904450 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485160 2246904478 C Bo:2:002:2 0 4096 >
> ee485630 2246904578 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485210 2246904602 C Bo:2:002:2 0 4096 >
> ee4856e0 2246904707 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee4852c0 2246904727 C Bo:2:002:2 0 4096 >
> ee485790 2246904830 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485370 2246904854 C Bo:2:002:2 0 4096 >
> ee485840 2246904928 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485420 2246904989 C Bo:2:002:2 0 4096 >
> ee4858f0 2246905058 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee4854d0 2246905128 C Bo:2:002:2 0 4096 >
> ee4859a0 2246905189 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485a50 2246905272 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485b00 2246905330 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485580 2246905352 C Bo:2:002:2 0 4096 >
> ee485bb0 2246905459 S Bo:2:002:2 -115 4096 = 00000000 00000000
00000000
> 00000000 00000000 00000000 00000000 00000000
> ee485630 2246905477 C Bo:2:002:2 0 4096 >
> ee4856e0 2246905728 C Bo:2:002:2 0 4096 >
> ee4859a0 2246905737 C Bo:2:002:2 -104 0
> ee485b00 2246905740 C Bo:2:002:2 -104 0
> ee485790 2246905868 C Bo:2:002:2 0 4096 >
> ee485840 2246906102 C Bo:2:002:2 0 4096 >
> ee4858f0 2246906228 C Bo:2:002:2 0 4096 >
> ee485a50 2246906352 C Bo:2:002:2 0 4096 >
> ee485bb0 2246906479 C Bo:2:002:2 0 4096 >
> 
> Here 13 of the URBs completed before the unlinks occurred, and
> afterward the remaining 5 completed normally.
> 
> Below is the patch I used for ehci-hcd.  If you confirm that it fixes
> your problem then I will submit it Greg, along with the new usbtest
> code.
> 

Yes, I've verified that the patch below works for me. Please submit it
to Greg, thanks a lot for your help.

Tested-by: Andiry Xu <andiry.xu@xxxxxxx>


> 
> 
>  drivers/usb/host/ehci-q.c |   12 ------------
>  1 file changed, 12 deletions(-)
> 
> Index: usb-2.6/drivers/usb/host/ehci-q.c
> ===================================================================
> --- usb-2.6.orig/drivers/usb/host/ehci-q.c
> +++ usb-2.6/drivers/usb/host/ehci-q.c
> @@ -315,7 +315,6 @@ qh_completions (struct ehci_hcd *ehci, s
>  	int			stopped;
>  	unsigned		count = 0;
>  	u8			state;
> -	const __le32		halt = HALT_BIT(ehci);
>  	struct ehci_qh_hw	*hw = qh->hw;
> 
>  	if (unlikely (list_empty (&qh->qtd_list)))
> @@ -422,7 +421,6 @@ qh_completions (struct ehci_hcd *ehci, s
>  					&& !(qtd->hw_alt_next
>  						& EHCI_LIST_END(ehci)))
{
>  				stopped = 1;
> -				goto halt;
>  			}
> 
>  		/* stop scanning when we reach qtds the hc is using */
> @@ -456,16 +454,6 @@ qh_completions (struct ehci_hcd *ehci, s
>  				 */
>  				ehci_clear_tt_buffer(ehci, qh, urb,
token);
>  			}
> -
> -			/* force halt for unlinked or blocked qh, so
we'll
> -			 * patch the qh later and so that completions
can't
> -			 * activate it while we "know" it's stopped.
> -			 */
> -			if ((halt & hw->hw_token) == 0) {
> -halt:
> -				hw->hw_token |= halt;
> -				wmb ();
> -			}
>  		}
> 
>  		/* unless we already know the urb's status, collect qtd
status
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux