[Resend with Hans's correct email address this time...] On Thu, 22 Oct 2015, Alan Stern wrote: > Hans and everyone else: > > This continues the discussion of a problem originally posted to the > libusb-devel mailing list > (see <http://marc.info/?l=libusb-devel&m=144423444825269&w=2> if > you're curious). > > The EHCI controller in question is an AMD/ATI SB7x0/SB8x0/SB9x0, as > found on the RX780/RX790 motherboard. I haven't seen this problem > occur with Intel hardware. > > The problem arises when an active bulk-in QH is removed from the async > schedule. The current qTD is cancelled, and it is the last qTD on the > QH's queue. At the time the QH is removed from the async list, the > overlay region shows that only a fraction of the qTD has been completed > (maybe 4 KB transferred out of 16 KB total). > > 10 ms later, four new qTDs are added to the QH and it gets added back > to the async schedule. Although I don't know this for certain, I > believe the second of these qTDs is stored at the same address as the > one that was cancelled. That's what naturally would happen if the > memory pool satisfies an allocation from the most recently freed area. > > Anyway, a short time later, it sometimes happen that the controller > gets stuck. The Active bit in the QH's overlay region is clear, and > the Current and Next qTD pointers both point to the second qTD in the > queue, which obviously is why the controller is not making any forward > progress. The first qTD's Active bit is still set and its Bytes To > Transfer is still set to 16 KB. The second qTD's Active bit is off and > its Bytes To Transfer is 0. In spite of this, neither qTD's data > buffer has been overwritten. > > Although it's hard to tell exactly what went wrong, my guess is that > the after the QH was removed from the async schedule, the controller > continued to process it until all 16 KB had been transferred. (This > would have taken no more than 0.5 ms.) Then at some point, the QH > overlay and the now-completed qTD were written back -- that would > explain why the second qTD in the queue shows up as not Active and with > no bytes remaining to transfer. > > On the other hand, that qTD wasn't reused until 10 ms after the QH was > removed from the schedule, and it was completely reinitialized before > reuse. The write-back must have occurred later than this; I have no > idea why. I also don't know why the write-back of the QH's overlay > region didn't overwrite the Next qTD pointer. > > > This is clearly a complicated problem. It's possible that we're simply > dealing with defective hardware, but I tend to doubt it. It seems more > likely that the problem is caused by improperly removing the active QH > from the async schedule. The driver does not follow the instructions > given in section 4.8.2 of the EHCI spec, which says that software > should not remove active QHs. > > [In practice it's not feasible to wait for an active QH to become > inactive before removing it, for several reasons. For one, the QH may > _never_ become inactive (if the endpoint NAKs indefinitely). For > another, the procedure given in the spec (deactivate the qTDs on the > queue) is racy, since the controller can perform a new overlay or > writeback at any time.] > > In an attempt to cope with potential problems, the Linux EHCI driver > goes through _two_ Interrupt on Async Advance (IAA) cycles after taking > a QH off the async list before considering it to be fully gone from the > schedule. (I have observed situations where the QH overlay region was > written back _after_ the first IAA interrupt.) But it seems that this > isn't enough. > > As far as I can see, the only alternative is to stop the async schedule > whenever an active QH has to be removed. But that would impose a > significant penalty on any other async transfers, so I really don't > want to do it. > > Hans, can you describe how the BSD EHCI driver handles this issue? > > Any ideas for fixing this or suggestions for additional debugging would > be welcome. > > Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html