On Tue, 15 Mar 2011, Xu, Andiry wrote: > > -----Original Message----- > > From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx] > > Sent: Monday, March 14, 2011 11:21 PM > > To: David Brownell; Xu, Andiry > > Cc: USB list > > Subject: RE: Question about HALT bit set of qTD in EHCI driver > > > > Dave: > > > > Can you help with this issue? Do you remember why the code below in > > qh_completions() sets the HALT bit in the QH overlay region? The > > comment doesn't seem to be relevant any more. > > > > /* force halt for unlinked or blocked qh, so > we'll > > * patch the qh later and so that completions > can't > > * activate it while we "know" it's stopped. > > */ > > if ((halt & hw->hw_token) == 0) { > > halt: > > hw->hw_token |= halt; > > wmb (); > > } > > > > Can we eliminate these statements? The fact that HALT is set in > > hw_token won't prevent the driver from doing anything: Contrary to > > what the comment says, whether or not the QH gets "patched" or > > reactivated has nothing to do with the HALT bit. > > > > In fact, there are circumstances in which the QH gets reactivated > > _without_ being updated (i.e., when an URB is unlinked before the > > hardware starts executing it. After the URB is given back the queue > > picks up right where it left off, so qh_refresh() doesn't call > > qh_update()). In this case setting the HALT bit would be a definite > > mistake! > > > > In fact, maybe that mistake is the reason for Andiry's problem... > > > > > > On Mon, 14 Mar 2011, Xu, Andiry wrote: > > > > > > Furthermore, these statements get executed in only three ways: > > > > > > > > The QH is unlinked. In this case the hardware should never > > > > see the QH so it shouldn't be confused by it. > > > > > > > > The QH is halted because an earlier qTD terminated with an > > > > error. In this case the Active bit should already be turned > > > > off, so turning on the Halt bit shouldn't hurt. > > > > > > > > A short read occurred. Is this the case that's bothering you? > > > > > > > > Now, I have to admit that I don't remember why this code sets the > Halt > > > > bit. Maybe it doesn't have to be done at all; I'm not sure. > > > > > The qh state is QH_STATE_IDLE when the driver set the Halt bit. No > error > > > or short read occurs. In this case HW should not see this qh, right? > > > > Right. QH_STATE_IDLE means the QH is no longer on the async list > _and_ > > an IAA interrupt has occurred since it was removed. > > > > If there was no error and no short read, then this statement must have > > been executed because an URB was unlinked. Is that what happened? > > > > Yes. The URB is unlinked. > > > > I've verified clear either the Active bit or the Halt bit in this > case > > > can workaround this issue. > > > > > > I've found that when disable async qh/qtd cache (a HW feature on our > > > platform), the issue disappears. Seems the HW caches the idle qh and > be > > > able to see it. > > > > That really sounds like a bug in the hardware design. The HW should > > _never_ cache QHs that have been fully removed from the async list. > > In fact, that's exactly what the IAA interrupt is supposed to mean -- > > it means that any QHs removed from the async list before the IAAD bit > > was set are no longer present in the HW cache. See section 4.8.2 in > > the EHCI spec. > > > > That's only my assumption. At this time we're still looking for root > cause and can not say it's a HW bug. The HW does fetch a qh overlay with > Active and Halt bit set and got confused when trying to process it, we > want to figure out why it's exposed to HW. > > > > Can I propose a workaround, do not set halt bit if it's an affected > > > platform and Active bit is set in this case? > > > > We may be able to avoid setting the halt bit entirely. Let's see what > > Dave says. > > > > OK, thanks. As it turns out, I was able to duplicate your result by creating a new test case for the usbtest driver. The new test caused ehci-hcd to hang on my Intel ICH5 motherboard, but after I removed the code that sets the Halt bit, everything worked correctly. The new usbtest case (test 17) submits N bulk-OUT URBs, then unlinks the URBs in position N-4 and N-2, and then waits for all the URBs to complete. Here is a usbmon trace of the test using the current ehci-hcd driver; I set N to 20 and the URB transfer lengths to 4096: e9f84840 2316090736 S Co:1:002:0 s 01 0b 0000 0000 0000 0 e9f84840 2316091362 C Co:1:002:0 0 0 e9f84840 2316091766 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84f20 2316091839 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84dc0 2316091892 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84c60 2316091944 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84a50 2316092033 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f848f0 2316092089 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84840 2316092105 C Bo:1:002:2 0 4096 > e9f849a0 2316092218 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84f20 2316092233 C Bo:1:002:2 0 4096 > ed74dbb0 2316092344 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84dc0 2316092359 C Bo:1:002:2 0 4096 > ed74dc60 2316092468 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84c60 2316092481 C Bo:1:002:2 0 4096 > ed74dd10 2316092559 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f84a50 2316092612 C Bo:1:002:2 0 4096 > ed74d420 2316092685 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f848f0 2316092753 C Bo:1:002:2 0 4096 > ed74d2c0 2316092813 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ed74d160 2316092894 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f849a0 2316092981 C Bo:1:002:2 0 4096 > ed74dbb0 2316093103 C Bo:1:002:2 0 4096 > ed74d370 2316093196 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ed74dc60 2316093228 C Bo:1:002:2 0 4096 > ed74db00 2316093293 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ed74dd10 2316093354 C Bo:1:002:2 0 4096 > eca498f0 2316093420 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ed74d420 2316093487 C Bo:1:002:2 0 4096 > eca49840 2316093549 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ed74d2c0 2316093629 C Bo:1:002:2 0 4096 > eca499a0 2316093677 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 eca49790 2316093730 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 eca49a50 2316093815 S Bo:1:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ed74d160 2316093853 C Bo:1:002:2 0 4096 > ed74d370 2316093980 C Bo:1:002:2 0 4096 > eca49840 2316093988 C Bo:1:002:2 -104 0 eca49790 2316093991 C Bo:1:002:2 -104 0 As you can see, all 20 URBs were submitted. 14 of them completed before the unlinks occurred, but then the remaining 4 did not complete. The testusb program hung. Here's the same test using a version of ehci-hcd in which the Halt bit doesn't get set: e9f849a0 2246902336 S Co:2:002:0 s 01 0b 0000 0000 0000 0 e9f849a0 2246903111 C Co:2:002:0 0 0 e9f849a0 2246903530 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f848f0 2246903608 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485000 2246903665 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee4850b0 2246903717 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485160 2246903803 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f849a0 2246903858 C Bo:2:002:2 0 4096 > ee485210 2246903911 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e9f848f0 2246903978 C Bo:2:002:2 0 4096 > ee4852c0 2246904045 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485000 2246904113 C Bo:2:002:2 0 4096 > ee485370 2246904178 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee4850b0 2246904253 C Bo:2:002:2 0 4096 > ee485420 2246904310 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee4854d0 2246904391 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485580 2246904450 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485160 2246904478 C Bo:2:002:2 0 4096 > ee485630 2246904578 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485210 2246904602 C Bo:2:002:2 0 4096 > ee4856e0 2246904707 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee4852c0 2246904727 C Bo:2:002:2 0 4096 > ee485790 2246904830 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485370 2246904854 C Bo:2:002:2 0 4096 > ee485840 2246904928 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485420 2246904989 C Bo:2:002:2 0 4096 > ee4858f0 2246905058 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee4854d0 2246905128 C Bo:2:002:2 0 4096 > ee4859a0 2246905189 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485a50 2246905272 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485b00 2246905330 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485580 2246905352 C Bo:2:002:2 0 4096 > ee485bb0 2246905459 S Bo:2:002:2 -115 4096 = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ee485630 2246905477 C Bo:2:002:2 0 4096 > ee4856e0 2246905728 C Bo:2:002:2 0 4096 > ee4859a0 2246905737 C Bo:2:002:2 -104 0 ee485b00 2246905740 C Bo:2:002:2 -104 0 ee485790 2246905868 C Bo:2:002:2 0 4096 > ee485840 2246906102 C Bo:2:002:2 0 4096 > ee4858f0 2246906228 C Bo:2:002:2 0 4096 > ee485a50 2246906352 C Bo:2:002:2 0 4096 > ee485bb0 2246906479 C Bo:2:002:2 0 4096 > Here 13 of the URBs completed before the unlinks occurred, and afterward the remaining 5 completed normally. Below is the patch I used for ehci-hcd. If you confirm that it fixes your problem then I will submit it Greg, along with the new usbtest code. Alan Stern drivers/usb/host/ehci-q.c | 12 ------------ 1 file changed, 12 deletions(-) Index: usb-2.6/drivers/usb/host/ehci-q.c =================================================================== --- usb-2.6.orig/drivers/usb/host/ehci-q.c +++ usb-2.6/drivers/usb/host/ehci-q.c @@ -315,7 +315,6 @@ qh_completions (struct ehci_hcd *ehci, s int stopped; unsigned count = 0; u8 state; - const __le32 halt = HALT_BIT(ehci); struct ehci_qh_hw *hw = qh->hw; if (unlikely (list_empty (&qh->qtd_list))) @@ -422,7 +421,6 @@ qh_completions (struct ehci_hcd *ehci, s && !(qtd->hw_alt_next & EHCI_LIST_END(ehci))) { stopped = 1; - goto halt; } /* stop scanning when we reach qtds the hc is using */ @@ -456,16 +454,6 @@ qh_completions (struct ehci_hcd *ehci, s */ ehci_clear_tt_buffer(ehci, qh, urb, token); } - - /* force halt for unlinked or blocked qh, so we'll - * patch the qh later and so that completions can't - * activate it while we "know" it's stopped. - */ - if ((halt & hw->hw_token) == 0) { -halt: - hw->hw_token |= halt; - wmb (); - } } /* unless we already know the urb's status, collect qtd status -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html