> -----Original Message----- > From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx] > Sent: Wednesday, March 16, 2011 3:50 AM > To: Xu, Andiry > Cc: David Brownell; USB list > Subject: RE: Question about HALT bit set of qTD in EHCI driver > > On Tue, 15 Mar 2011, Xu, Andiry wrote: > > > > -----Original Message----- > > > From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx] > > > Sent: Monday, March 14, 2011 11:21 PM > > > To: David Brownell; Xu, Andiry > > > Cc: USB list > > > Subject: RE: Question about HALT bit set of qTD in EHCI driver > > > > > > Dave: > > > > > > Can you help with this issue? Do you remember why the code below in > > > qh_completions() sets the HALT bit in the QH overlay region? The > > > comment doesn't seem to be relevant any more. > > > > > > /* force halt for unlinked or blocked qh, so > > we'll > > > * patch the qh later and so that completions > > can't > > > * activate it while we "know" it's stopped. > > > */ > > > if ((halt & hw->hw_token) == 0) { > > > halt: > > > hw->hw_token |= halt; > > > wmb (); > > > } > > > > > > Can we eliminate these statements? The fact that HALT is set in > > > hw_token won't prevent the driver from doing anything: Contrary to > > > what the comment says, whether or not the QH gets "patched" or > > > reactivated has nothing to do with the HALT bit. > > > > > > In fact, there are circumstances in which the QH gets reactivated > > > _without_ being updated (i.e., when an URB is unlinked before the > > > hardware starts executing it. After the URB is given back the queue > > > picks up right where it left off, so qh_refresh() doesn't call > > > qh_update()). In this case setting the HALT bit would be a definite > > > mistake! > > > > > > In fact, maybe that mistake is the reason for Andiry's problem... > > > > > > > > > On Mon, 14 Mar 2011, Xu, Andiry wrote: > > > > > > > > Furthermore, these statements get executed in only three ways: > > > > > > > > > > The QH is unlinked. In this case the hardware should never > > > > > see the QH so it shouldn't be confused by it. > > > > > > > > > > The QH is halted because an earlier qTD terminated with an > > > > > error. In this case the Active bit should already be turned > > > > > off, so turning on the Halt bit shouldn't hurt. > > > > > > > > > > A short read occurred. Is this the case that's bothering you? > > > > > > > > > > Now, I have to admit that I don't remember why this code sets the > > Halt > > > > > bit. Maybe it doesn't have to be done at all; I'm not sure. > > > > > > > The qh state is QH_STATE_IDLE when the driver set the Halt bit. No > > error > > > > or short read occurs. In this case HW should not see this qh, right? > > > > > > Right. QH_STATE_IDLE means the QH is no longer on the async list > > _and_ > > > an IAA interrupt has occurred since it was removed. > > > > > > If there was no error and no short read, then this statement must have > > > been executed because an URB was unlinked. Is that what happened? > > > > > > > Yes. The URB is unlinked. > > > > > > I've verified clear either the Active bit or the Halt bit in this > > case > > > > can workaround this issue. > > > > > > > > I've found that when disable async qh/qtd cache (a HW feature on our > > > > platform), the issue disappears. Seems the HW caches the idle qh and > > be > > > > able to see it. > > > > > > That really sounds like a bug in the hardware design. The HW should > > > _never_ cache QHs that have been fully removed from the async list. > > > In fact, that's exactly what the IAA interrupt is supposed to mean -- > > > it means that any QHs removed from the async list before the IAAD bit > > > was set are no longer present in the HW cache. See section 4.8.2 in > > > the EHCI spec. > > > > > > > That's only my assumption. At this time we're still looking for root > > cause and can not say it's a HW bug. The HW does fetch a qh overlay with > > Active and Halt bit set and got confused when trying to process it, we > > want to figure out why it's exposed to HW. > > > > > > Can I propose a workaround, do not set halt bit if it's an affected > > > > platform and Active bit is set in this case? > > > > > > We may be able to avoid setting the halt bit entirely. Let's see what > > > Dave says. > > > > > > > OK, thanks. > > As it turns out, I was able to duplicate your result by creating a new > test case for the usbtest driver. The new test caused ehci-hcd to hang > on my Intel ICH5 motherboard, but after I removed the code that sets > the Halt bit, everything worked correctly. > > The new usbtest case (test 17) submits N bulk-OUT URBs, then unlinks > the URBs in position N-4 and N-2, and then waits for all the URBs to > complete. Here is a usbmon trace of the test using the current > ehci-hcd driver; I set N to 20 and the URB transfer lengths to 4096: > > e9f84840 2316090736 S Co:1:002:0 s 01 0b 0000 0000 0000 0 > e9f84840 2316091362 C Co:1:002:0 0 0 > e9f84840 2316091766 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84f20 2316091839 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84dc0 2316091892 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84c60 2316091944 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84a50 2316092033 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f848f0 2316092089 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84840 2316092105 C Bo:1:002:2 0 4096 > > e9f849a0 2316092218 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84f20 2316092233 C Bo:1:002:2 0 4096 > > ed74dbb0 2316092344 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84dc0 2316092359 C Bo:1:002:2 0 4096 > > ed74dc60 2316092468 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84c60 2316092481 C Bo:1:002:2 0 4096 > > ed74dd10 2316092559 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f84a50 2316092612 C Bo:1:002:2 0 4096 > > ed74d420 2316092685 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f848f0 2316092753 C Bo:1:002:2 0 4096 > > ed74d2c0 2316092813 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ed74d160 2316092894 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f849a0 2316092981 C Bo:1:002:2 0 4096 > > ed74dbb0 2316093103 C Bo:1:002:2 0 4096 > > ed74d370 2316093196 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ed74dc60 2316093228 C Bo:1:002:2 0 4096 > > ed74db00 2316093293 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ed74dd10 2316093354 C Bo:1:002:2 0 4096 > > eca498f0 2316093420 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ed74d420 2316093487 C Bo:1:002:2 0 4096 > > eca49840 2316093549 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ed74d2c0 2316093629 C Bo:1:002:2 0 4096 > > eca499a0 2316093677 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > eca49790 2316093730 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > eca49a50 2316093815 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ed74d160 2316093853 C Bo:1:002:2 0 4096 > > ed74d370 2316093980 C Bo:1:002:2 0 4096 > > eca49840 2316093988 C Bo:1:002:2 -104 0 > eca49790 2316093991 C Bo:1:002:2 -104 0 > > As you can see, all 20 URBs were submitted. 14 of them completed > before the unlinks occurred, but then the remaining 4 did not complete. > The testusb program hung. > > Here's the same test using a version of ehci-hcd in which the Halt bit > doesn't get set: > > e9f849a0 2246902336 S Co:2:002:0 s 01 0b 0000 0000 0000 0 > e9f849a0 2246903111 C Co:2:002:0 0 0 > e9f849a0 2246903530 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f848f0 2246903608 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485000 2246903665 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee4850b0 2246903717 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485160 2246903803 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f849a0 2246903858 C Bo:2:002:2 0 4096 > > ee485210 2246903911 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > e9f848f0 2246903978 C Bo:2:002:2 0 4096 > > ee4852c0 2246904045 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485000 2246904113 C Bo:2:002:2 0 4096 > > ee485370 2246904178 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee4850b0 2246904253 C Bo:2:002:2 0 4096 > > ee485420 2246904310 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee4854d0 2246904391 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485580 2246904450 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485160 2246904478 C Bo:2:002:2 0 4096 > > ee485630 2246904578 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485210 2246904602 C Bo:2:002:2 0 4096 > > ee4856e0 2246904707 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee4852c0 2246904727 C Bo:2:002:2 0 4096 > > ee485790 2246904830 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485370 2246904854 C Bo:2:002:2 0 4096 > > ee485840 2246904928 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485420 2246904989 C Bo:2:002:2 0 4096 > > ee4858f0 2246905058 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee4854d0 2246905128 C Bo:2:002:2 0 4096 > > ee4859a0 2246905189 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485a50 2246905272 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485b00 2246905330 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485580 2246905352 C Bo:2:002:2 0 4096 > > ee485bb0 2246905459 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 > ee485630 2246905477 C Bo:2:002:2 0 4096 > > ee4856e0 2246905728 C Bo:2:002:2 0 4096 > > ee4859a0 2246905737 C Bo:2:002:2 -104 0 > ee485b00 2246905740 C Bo:2:002:2 -104 0 > ee485790 2246905868 C Bo:2:002:2 0 4096 > > ee485840 2246906102 C Bo:2:002:2 0 4096 > > ee4858f0 2246906228 C Bo:2:002:2 0 4096 > > ee485a50 2246906352 C Bo:2:002:2 0 4096 > > ee485bb0 2246906479 C Bo:2:002:2 0 4096 > > > Here 13 of the URBs completed before the unlinks occurred, and > afterward the remaining 5 completed normally. > > Below is the patch I used for ehci-hcd. If you confirm that it fixes > your problem then I will submit it Greg, along with the new usbtest > code. > Yes, I've verified that the patch below works for me. Please submit it to Greg, thanks a lot for your help. Tested-by: Andiry Xu <andiry.xu@xxxxxxx> > > > drivers/usb/host/ehci-q.c | 12 ------------ > 1 file changed, 12 deletions(-) > > Index: usb-2.6/drivers/usb/host/ehci-q.c > =================================================================== > --- usb-2.6.orig/drivers/usb/host/ehci-q.c > +++ usb-2.6/drivers/usb/host/ehci-q.c > @@ -315,7 +315,6 @@ qh_completions (struct ehci_hcd *ehci, s > int stopped; > unsigned count = 0; > u8 state; > - const __le32 halt = HALT_BIT(ehci); > struct ehci_qh_hw *hw = qh->hw; > > if (unlikely (list_empty (&qh->qtd_list))) > @@ -422,7 +421,6 @@ qh_completions (struct ehci_hcd *ehci, s > && !(qtd->hw_alt_next > & EHCI_LIST_END(ehci))) { > stopped = 1; > - goto halt; > } > > /* stop scanning when we reach qtds the hc is using */ > @@ -456,16 +454,6 @@ qh_completions (struct ehci_hcd *ehci, s > */ > ehci_clear_tt_buffer(ehci, qh, urb, token); > } > - > - /* force halt for unlinked or blocked qh, so we'll > - * patch the qh later and so that completions can't > - * activate it while we "know" it's stopped. > - */ > - if ((halt & hw->hw_token) == 0) { > -halt: > - hw->hw_token |= halt; > - wmb (); > - } > } > > /* unless we already know the urb's status, collect qtd status > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html