On Tue, 27 Jul 2010, Hans Petter Selasky wrote: > > How often do the Intel controllers follow the wrong pointer? > > The issue is 100% reproducible, and appeared to me like some kind of hardware > bug. I checked my e-mail archive today, but could not find the e-mails where I > debugged this issue. Sorry about that. If you want to find out more you will > have to setup a special test to check this out on various hardware yourself! > > Instructions: > > 1) Setup a TD chain like this: > > QTD(0xc2f8e480) at 0x0178e480: > next=0x0178e400<> altnext=0x00000001<T> > status=0x40000000: toggle=0 bytes=0x4000 ioc=0 c_page=0x0 > cerr=0 pid=0 stat=ACTIVE > buffer[0]=0x0de1e000 > buffer[1]=0x0de1f000 > buffer[2]=0x0de20000 > buffer[3]=0x0de21000 > buffer[4]=0x0de21000 > buffer_hi[0]=0x00000000 > buffer_hi[1]=0x00000000 > buffer_hi[2]=0x00000000 > buffer_hi[3]=0x00000000 > buffer_hi[4]=0x00000000 Without checking in detail, it looks like this wants to transfer 16 KB of data. Since the altnext field is set to 1, a short packet will cause the controller to follow the "next" pointer. > QTD(0xc2f8e400) at 0x0178e400: > next=0x0178e380<> altnext=0x00000001<T> > status=0x40000080: toggle=0 bytes=0x4000 ioc=0 c_page=0x0 > cerr=0 pid=0 stat=ACTIVE > buffer[0]=0x0de22000 > buffer[1]=0x0de23000 > buffer[2]=0x0de24000 > buffer[3]=0x0de25000 > buffer[4]=0x0de25000 > buffer_hi[0]=0x00000000 > buffer_hi[1]=0x00000000 > buffer_hi[2]=0x00000000 > buffer_hi[3]=0x00000000 > buffer_hi[4]=0x00000000 This is much the same as the previous qTD. > 2) Send from the USB gadget the following byte sequence in a HS BULK endpoint: > 512 + ZLP or 1024 + ZLP or 2048 + ZLP or 4096 + ZLP. Try also to replace ZLP > with a short packet. One of these cases should trigger the bug, that the EHCI > continues working on the next TD, though filling some crap into the bytes bits > of the status DWORD. When you say "the next TD", do you mean the second TD above (at 0x0178e400) or rather the TD that follows it (at 0x0178e380)? In each case, I would expect the controller to store N bytes in the first 16-KB buffer (where N is 512, 1024, 2048, or 4096 respectively) and 0 bytes in the second buffer, and then to move on to the following TD. If you had set altnext to some other value, then the controller would behave differently. > NOTE: Usually the software will see an interrupt and and check the TD's, and > then it will see a short packet and remove the TD-chain. If the software is > quick enough, no bug will trigger. If the software/interrupt handler gets > delayed, there is a chance that the EHCI can receive data into the next TD > pointed to by the next field. But that's supposed to happen! > 3) My conclusion: Avoid receiving more than 16K on any BULK IN endpoint per > EHCI IRQ. Chaining on BULK OUT endpoints does not have this kind of bug. Are you sure this is really a bug? It doesn't look that way to me. And if it is a bug, why do you limit yourself to 16 KB per interrupt? Wouldn't it make more sense to set the limit to one qTD per interrupt? (Note that a qTD may want to transfer less than 16 KB.) > The issue was found on INTEL controllers at least. I don't have the exact > version. The original test was Mass Storage, where the MSC (SCSI + BOT- > protocol) device, was short terminating the data-stage and then the EHCI > sometimes got the CSW (command status wrapper block) aswell into the remaining > part of the TD chain, and then I got a timeout when trying to read the TD a > second time. > > I don't have any more information than this. Maybe something to investigate > for you Linux guys? Certainly, if there really does turn out to be a problem. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html