On Thu, Oct 22, 2015 at 05:14:41PM -0400, Alan Stern wrote: > [Resend with Hans's correct email address this time...] > > On Thu, 22 Oct 2015, Alan Stern wrote: > > > Hans and everyone else: > > > > This continues the discussion of a problem originally posted to the > > libusb-devel mailing list > > (see <http://marc.info/?l=libusb-devel&m=144423444825269&w=2> if > > you're curious). > > > > The EHCI controller in question is an AMD/ATI SB7x0/SB8x0/SB9x0, as > > found on the RX780/RX790 motherboard. I haven't seen this problem > > occur with Intel hardware. > > > > The problem arises when an active bulk-in QH is removed from the async > > schedule. The current qTD is cancelled, and it is the last qTD on the > > QH's queue. At the time the QH is removed from the async list, the > > overlay region shows that only a fraction of the qTD has been completed > > (maybe 4 KB transferred out of 16 KB total). > > > > 10 ms later, four new qTDs are added to the QH and it gets added back > > to the async schedule. Although I don't know this for certain, I > > believe the second of these qTDs is stored at the same address as the > > one that was cancelled. That's what naturally would happen if the > > memory pool satisfies an allocation from the most recently freed area. > > > > Anyway, a short time later, it sometimes happen that the controller > > gets stuck. The Active bit in the QH's overlay region is clear, and > > the Current and Next qTD pointers both point to the second qTD in the > > queue, which obviously is why the controller is not making any forward > > progress. The first qTD's Active bit is still set and its Bytes To > > Transfer is still set to 16 KB. The second qTD's Active bit is off and > > its Bytes To Transfer is 0. In spite of this, neither qTD's data > > buffer has been overwritten. > > > > Although it's hard to tell exactly what went wrong, my guess is that > > the after the QH was removed from the async schedule, the controller > > continued to process it until all 16 KB had been transferred. (This > > would have taken no more than 0.5 ms.) Then at some point, the QH > > overlay and the now-completed qTD were written back -- that would > > explain why the second qTD in the queue shows up as not Active and with > > no bytes remaining to transfer. > > > > On the other hand, that qTD wasn't reused until 10 ms after the QH was > > removed from the schedule, and it was completely reinitialized before > > reuse. The write-back must have occurred later than this; I have no > > idea why. I also don't know why the write-back of the QH's overlay > > region didn't overwrite the Next qTD pointer. > > > > > > This is clearly a complicated problem. It's possible that we're simply > > dealing with defective hardware, but I tend to doubt it. It seems more > > likely that the problem is caused by improperly removing the active QH > > from the async schedule. The driver does not follow the instructions > > given in section 4.8.2 of the EHCI spec, which says that software > > should not remove active QHs. > > > > [In practice it's not feasible to wait for an active QH to become > > inactive before removing it, for several reasons. For one, the QH may > > _never_ become inactive (if the endpoint NAKs indefinitely). For > > another, the procedure given in the spec (deactivate the qTDs on the > > queue) is racy, since the controller can perform a new overlay or > > writeback at any time.] > > Alan, one question, what will happen if we never remove an active QH from async list? -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html