Hi, On Mon, Aug 29, 2011 at 11:03 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 29 Aug 2011, Ming Lei wrote: > >> IMO, the dummy has been linked into queue pointed by qh->hw->hw_qtd_next, >> so EHCI will fetch dummy qtd and execute the transaction and will not have >> any delay on the transaction. >> >> Let me explain the problem again: On ARM, the wmb() before >> 'dummy->hw_token = token;' >> will flush l2 write buffer into memory and all parts of 'dummy' except >> for hw_token field >> have reached into memory already, but dummy->hw_token will stay at l2 >> write buffer >> and not reach into memory at this time, so ehci may fetch a >> inconsistent qtd and execute it, >> then mistaken IOC or "total bytes to transfer" are read by EHCI and >> cause delayed irq >> or lost irq. > > No. Even if the HC reads dummy before dummy->hw_token has been written > out to memory from the L2 cache, it will not see any inconsistencies. > It will see the old value in hw_token, which has the ACTIVE bit clear. > Therefore it will not try to execute the qTD but will move on to the > next QH. See what the fourth paragraph in section 4.10.2 of the EHCI > spec says about the case where a qTD's ACTIVE bit is set to 0. Suppose HC can see the old value in hw_token, which has the ACTIVE bit clear. The qtd transaction will not be executed until the new token is flushed into memory. >From view of CPU, the irq is still be delayed because ioc irq is generated after the qtd transaction is executed when new token is flushed into memory. The delay depends on how long the token will be flushed into memory. >From my observation, the delay is commonly over 5ms for CSW, sometimes the delay is more than 20ms, so caused the bad performance. Also I am not sure if EHCI can read the old hw_token correctly in this kind of inconsistent memory state. > > Some EHCI implementations have a quirk, in which they perform the > overlay even when ACTIVE is clear. But even these implementations > won't try to execute the qTD, because the old value of dummy->hw_token > also has the HALT bit set. > >> It is not only a reasoning or guess, and I have traced this kind of >> fact certainly. > > If your controller behaves as you suggest then it is buggy. And in > that case, adding another memory barrier won't fix it. There is still > the possibility that the HC will read dummy during the brief time after > the existing wmb() and before the CPU has written dummy->hw_token. No, the mb after 'dummy->hw_token=token' does fix the problem. As said above, IOC IRQ is surely delayed from view of CPU. thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html