On Monday 26 July 2010 20:30:01 Alan Stern wrote: > [Moved to linux-usb because this is unrelated to libusb] > > On Mon, 26 Jul 2010, Hans Petter Selasky wrote: > > Hi Alan, > > > > > > For example the use of the alt_next field in the EHCI TD on BULK > > > > endpoints, to receive multiple short packets. On some chipsets it > > > > works. On others it crashes the hardware. The EHCI spec. is > > > > completely silent on the issue. > > > > I found out that the EHCI chip was using the qtd_next on short packet > > instead of alt_next, and then corrupted the total-bytes field: > > > > http://svn.freebsd.org/viewvc/base?view=revision&revision=197682 > > > > Search google for the commit ID, and possibly you will find the logs > > showing this too. > > I couldn't find the logs or bug reports. Can you provide any URLs? > > > > No it isn't. Section 4.10.2 says very explicitly how the alt_next > > > field should be used. > > > > Ok, right, but it doesn't change the fact: Short packets on EHCI TD's are > > buggy and I really hope it is better with the XHCI :-) > > It's odd that I haven't encountered any reports about this problem. > Maybe it's because the short packets tend to occur in the last TD of an > URB, where we don't care if the queue keeps on running and so the > alt_next pointer isn't set. > Hi, > How often do the Intel controllers follow the wrong pointer? The issue is 100% reproducible, and appeared to me like some kind of hardware bug. I checked my e-mail archive today, but could not find the e-mails where I debugged this issue. Sorry about that. If you want to find out more you will have to setup a special test to check this out on various hardware yourself! Instructions: 1) Setup a TD chain like this: QTD(0xc2f8e480) at 0x0178e480: next=0x0178e400<> altnext=0x00000001<T> status=0x40000000: toggle=0 bytes=0x4000 ioc=0 c_page=0x0 cerr=0 pid=0 stat=ACTIVE buffer[0]=0x0de1e000 buffer[1]=0x0de1f000 buffer[2]=0x0de20000 buffer[3]=0x0de21000 buffer[4]=0x0de21000 buffer_hi[0]=0x00000000 buffer_hi[1]=0x00000000 buffer_hi[2]=0x00000000 buffer_hi[3]=0x00000000 buffer_hi[4]=0x00000000 QTD(0xc2f8e400) at 0x0178e400: next=0x0178e380<> altnext=0x00000001<T> status=0x40000080: toggle=0 bytes=0x4000 ioc=0 c_page=0x0 cerr=0 pid=0 stat=ACTIVE buffer[0]=0x0de22000 buffer[1]=0x0de23000 buffer[2]=0x0de24000 buffer[3]=0x0de25000 buffer[4]=0x0de25000 buffer_hi[0]=0x00000000 buffer_hi[1]=0x00000000 buffer_hi[2]=0x00000000 buffer_hi[3]=0x00000000 buffer_hi[4]=0x00000000 2) Send from the USB gadget the following byte sequence in a HS BULK endpoint: 512 + ZLP or 1024 + ZLP or 2048 + ZLP or 4096 + ZLP. Try also to replace ZLP with a short packet. One of these cases should trigger the bug, that the EHCI continues working on the next TD, though filling some crap into the bytes bits of the status DWORD. NOTE: Usually the software will see an interrupt and and check the TD's, and then it will see a short packet and remove the TD-chain. If the software is quick enough, no bug will trigger. If the software/interrupt handler gets delayed, there is a chance that the EHCI can receive data into the next TD pointed to by the next field. 3) My conclusion: Avoid receiving more than 16K on any BULK IN endpoint per EHCI IRQ. Chaining on BULK OUT endpoints does not have this kind of bug. The issue was found on INTEL controllers at least. I don't have the exact version. The original test was Mass Storage, where the MSC (SCSI + BOT- protocol) device, was short terminating the data-stage and then the EHCI sometimes got the CSW (command status wrapper block) aswell into the remaining part of the TD chain, and then I got a timeout when trying to read the TD a second time. I don't have any more information than this. Maybe something to investigate for you Linux guys? --HPS -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html