Re: [patch 2.6.29] usb: ehci-sched.c: EHCI SITD scheduling bugfix (resend)

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Thu, 16 Apr 2009 12:05:39 -0400 (EDT)

On Wed, 15 Apr 2009, Dan Streetman wrote:

> > Why? Â This follows from the constraint (implicit in the USB spec and
> > stated explicitly in the EHCI spec) that in any uframe, CSPLITs must be
> > sent in the same order as their corresponding SSPLITs. Â Since an siTD
> > will always precede a QH in the periodic list's chain of pointers, the
> > isoc CSPLIT would precede the interrupt CSPLIT if they both occurred in
> > the same uframe -- and that would violate the constraint.
> 
> Yep, you are right on that point - so to adhere to spec, the TT
> scheduler really does need to rebalance the TT schedule to put all the
> interrupt transfers after isoc transfers.

Or else start the isoc transfers at least three uframes after any 
interrupts transfers start.

> > What is the reason? Â It's because a TT isn't required to buffer more
> > than 188 bytes per uframe.
> 
> I'm not sure where you see that in the spec?  I'm looking at 11.19,
> which says the start split buffer is (minimum) 752 bytes.

That's right; the buffer isn't required to hold more than 4 x 188
bytes.  Now in 11.18.6.2, the spec says that a start-split received in
uframe X must be dropped at the start of uframe X+4.  (This is also
mentioned in 11.18.7.)  This means that the TT must be able to buffer
the start-split during uframes X, X+1, X+2, and X+3.  In other words,
the buffer has to be large enough to hold four uframes' worth of
start-split data.

So if the host delivers more than 188 bytes during any of those four
uframes, the buffer might overflow.  Of course this will depend on the
design; a TT might be able to handle more than 188 bytes during a
particular uframe if the surrounding uframes contain fewer than 188.  
But it might not.  And you certainly wouldn't want to send a sequence
like: 188, 188, 188, 192 (as would happen with a 692-byte isoc-OUT
followed by a 64-byte interrupt-OUT).

In short, the spec is ambiguous on this matter, meaning that
manufacturers are free to interpret it as they like.  To be safe, we
should be conservative and never schedule an OUT transfer that crosses
a uframe boundary unless it occupies the entire preceding uframe.

> I disagree with your interpretation.  I believe that you can issue a
> 188 byte SSPLIT even if you know there is another transfer going on,
> and even if you have issued another smaller transfer (less than 188
> bytes obviously) earlier in that uframe.  The TT has a (minimum) 752
> byte SSPLIT buffer that it uses to buffer transfers up and execute
> when possible.

Doesn't this contradict your earlier statement that a multi-uframe 
transfer has to be scheduled so that its first SSPLIT (and each of the 
following 188-byte SSPLITs) occupies an otherwise empty uframe?  After 
all, a 188-byte transfer that begins in the same uframe as another 
smaller transfer will certainly be multi-uframe.

Or did you really mean "> 188 bytes" when you said "multi-uframe"
earlier?

> > I agree that the -1 fix is needed. Â But I don't agree that the
> > resulting code is bug-free. Â For example, suppose there's nothing but a
> > 64-byte interrupt transfer in uframe 0, and you want to add a 150-byte
> > isoc-OUT. Â How will the carryover logic either prevent this or force
> > the isoc-OUT to start no earlier than uframe 3?
> 
> Eh?  I don't follow you, it should start before uframe 3, in fact it
> should start immediately after the 64-byte interrupt transfer.

It can't start immediately _after_ the interrupt transfer.  The EHCI
hardware design doesn't permit this; siTDs always have to occur before
QHs in the pointer chain.

>  The
> best thing to do is schedule both of them in uframe 0, and let the TT
> run both of them in uframe 1 and the remaining part of the interrupt
> transfer will carry over on the full-speed bus to uframe 2.

This would force the TT to buffer 216 bytes during uframe 0 (assuming
the interrupt is also OUT).  As I argued above, this isn't safe.

> > Another issue -- all the calculations in tt_available() and
> > carryover_tt_bandwith() are done in usec instead of bytes. Â This
> > renders them subject to roundoff errors.
> 
> The USB spec provides the functions that must be used to calculate
> times, sec 5.11.3.  We could use nsecs instead of usecs, although I
> don't think that rounding impacts much.  The USB spec does say in that
> section that "different implementations may choose to use coarser
> approximations of these times".

You are mixing up two different kinds of timing calculations.  The
formulas is 5.11.3 are "worst case", meant for enforcing the 90%
limit on periodic transfers.  But the periodic TT scheduler uses "best
case" calculations, should be done in terms of bytes.  Otherwise you 
run a risk of going over the 188-byte limit.

The spec itself shares a little of this same confusion.  For instance, 
11.18.1 is nominally about best-case scheduling, but the 1157 value it 
computes is actually a worst-case figure.

> > Finally, one last question. Â The specs (and the code) mention in
> > several places that with proper scheduling, more than 16 full-speed
> > transactions will never occur within a single uframe. Â That doesn't
> > seem right to me.
> >
> > For example, why can't 17 two-byte isoc transfers occur within 125
> > usec? Â As I see it, the bandwidth required per isoc transfer is:
> >
> > Â  Â  Â  Â token packet (1-byte sync plus 3-byte packet)
> > Â  Â  Â  Â data packet (1-byte sync, 1-byte PID, 2 bytes of data,
> > Â  Â  Â  Â  Â  Â  Â  Â 2-byte CRC)
> >
> > which is 10 bytes (plus a fraction for the inter-packet gaps). Â 17 of
> > these would occupy less than 187.5 byte times if there's no
> > bit-stuffing. Â And if the transfers were 0 bytes long then there would
> > be enough time for 22 of them!
> >
> > So what am I missing?
> 
> Using the isoc out function specified in the USB spec sec 5.11.3, with
> a 1-byte isoc, no bitstuffing, and minimum think time (8 bits or
> 666ns), the per transfer time is 7849.94ns.  That only fits 16 times
> into 125ms (really, about 600ns over).

Well, Table 5-4 says that a single frame can accomodate 136 full-speed 
2-byte isoc transfers and 150 1-byte transfers.  Both these values are 
at least as large as 8*17 = 136, which implies that some of the 
uframes would contain at least 17 transfers.

The formulas in 5.11.3 aren't appropriate here, since they give 
worst-case values.  But the TT scheduling is supposed to be best case.

Incidentally, it's a little surprising to see the limitations of the 
split-transaction approach.  A full-speed controller would have no 
difficulty handling four 200-byte isoc-OUT transfers during a frame, 
but EHCI just can't do it!  (Each transfer would have to occupy two 
uframes with no overlap, and that isn't permitted.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html