Re: EHCI and short packets [was: Re: [Libusb-devel] USB 3.0]

Hans Petter Selasky <hselasky@xxxxxxx> · Tue, 27 Jul 2010 00:54:40 +0200

On Monday 26 July 2010 20:30:01 Alan Stern wrote:
> [Moved to linux-usb because this is unrelated to libusb]
> 
> On Mon, 26 Jul 2010, Hans Petter Selasky wrote:
> > Hi Alan,
> > 
> > > > For example the use of the alt_next field in the EHCI TD on BULK
> > > > endpoints, to receive multiple short packets. On some chipsets it
> > > > works. On others it crashes the hardware. The EHCI spec. is
> > > > completely silent on the issue.
> > 
> > I found out that the EHCI chip was using the qtd_next on short packet
> > instead of alt_next, and then corrupted the total-bytes field:
> > 
> > http://svn.freebsd.org/viewvc/base?view=revision&revision=197682
> > 
> > Search google for the commit ID, and possibly you will find the logs
> > showing this too.
> 
> I couldn't find the logs or bug reports.  Can you provide any URLs?
> 
> > > No it isn't.  Section 4.10.2 says very explicitly how the alt_next
> > > field should be used.
> > 
> > Ok, right, but it doesn't change the fact: Short packets on EHCI TD's are
> > buggy and I really hope it is better with the XHCI :-)
> 
> It's odd that I haven't encountered any reports about this problem.
> Maybe it's because the short packets tend to occur in the last TD of an
> URB, where we don't care if the queue keeps on running and so the
> alt_next pointer isn't set.
> 

Hi,

> How often do the Intel controllers follow the wrong pointer?

The issue is 100% reproducible, and appeared to me like some kind of hardware 
bug. I checked my e-mail archive today, but could not find the e-mails where I 
debugged this issue. Sorry about that. If you want to find out more you will 
have to setup a special test to check this out on various hardware yourself!

Instructions:

1) Setup a TD chain like this:

QTD(0xc2f8e480) at 0x0178e480:
next=0x0178e400<> altnext=0x00000001<T>
status=0x40000000: toggle=0 bytes=0x4000 ioc=0 c_page=0x0
cerr=0 pid=0 stat=ACTIVE
buffer[0]=0x0de1e000
buffer[1]=0x0de1f000
buffer[2]=0x0de20000
buffer[3]=0x0de21000
buffer[4]=0x0de21000
buffer_hi[0]=0x00000000
buffer_hi[1]=0x00000000
buffer_hi[2]=0x00000000
buffer_hi[3]=0x00000000
buffer_hi[4]=0x00000000
QTD(0xc2f8e400) at 0x0178e400:
next=0x0178e380<> altnext=0x00000001<T>
status=0x40000080: toggle=0 bytes=0x4000 ioc=0 c_page=0x0
cerr=0 pid=0 stat=ACTIVE
buffer[0]=0x0de22000
buffer[1]=0x0de23000
buffer[2]=0x0de24000
buffer[3]=0x0de25000
buffer[4]=0x0de25000
buffer_hi[0]=0x00000000
buffer_hi[1]=0x00000000
buffer_hi[2]=0x00000000
buffer_hi[3]=0x00000000
buffer_hi[4]=0x00000000

2) Send from the USB gadget the following byte sequence in a HS BULK endpoint: 
512 + ZLP or 1024 + ZLP or 2048 + ZLP or 4096 + ZLP. Try also to replace ZLP 
with a short packet. One of these cases should trigger the bug, that the EHCI 
continues working on the next TD, though filling some crap into the bytes bits 
of the status DWORD.

NOTE: Usually the software will see an interrupt and and check the TD's, and 
then it will see a short packet and remove the TD-chain. If the software is 
quick enough, no bug will trigger. If the software/interrupt handler gets 
delayed, there is a chance that the EHCI can receive data into the next TD 
pointed to by the next field.

3) My conclusion: Avoid receiving more than 16K on any BULK IN endpoint per 
EHCI IRQ. Chaining on BULK OUT endpoints does not have this kind of bug.

The issue was found on INTEL controllers at least. I don't have the exact 
version. The original test was Mass Storage, where the MSC (SCSI + BOT-
protocol) device, was short terminating the data-stage and then the EHCI 
sometimes got the CSW (command status wrapper block) aswell into the remaining 
part of the TD chain, and then I got a timeout when trying to read the TD a 
second time.

I don't have any more information than this. Maybe something to investigate 
for you Linux guys?

--HPS
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html