Analysis of USB 2.0 HUB TT scheduling + suggested fixes

Hans de Goede <hdegoede@xxxxxxxxxx> · Mon, 25 Oct 2010 13:05:29 +0200

Hi All,

The last 3 days I've been working on (*) getting a few usb-1.1
webcams to work when connected through a usb-2.0 hub.

*) And also doing a lot of reading of ehci-sched.c and the usb 2.0
standard.

This mail is an attempt to organize my thoughts, point out
what I believe are bugs in the current code and suggest some
enhancements.

Note the below discusses the CONFIG_USB_EHCI_TT_NEWSCHED code, some
may apply to the old code to, some may not.

###

When discussing linux interrupt and isoc urb TT scheduling it
is important to note that the current ehci code does *NOT*
support carrying complete splits over from the current frame
to the next frame. This means that:

1) For interrupt packets uframes 5 and 6 can not be used.
   The current code does allow interrupt packets to use
   uframe 5, but this is wrong. The spec says that the
   uframe in which an interrupt packet is scheduled must
   be followed by *3* complete splits, these 3 would fall in
   uframe 6, uframe 7 and uframe 0 *of the next frame*
   Which we do not support, so the check in
   ehci-sched.c: check_intr_schedule () :

           if (qh->c_usecs && uframe >= 6)         /* FSTN territory? */

   Needs to be changed to:

           if (qh->c_usecs && uframe >= 5)         /* FSTN territory? */

   Also see the TODO note slightly further down this function

2) For isoc in packets the rules are similar to interrupt
   rules, except that the isoc in packet can span multiple
   uframes so the rule is that all uframes following the 1st
   used uframe until the last used one + 3, must have
   complete splits. so say an isoc in packet uses uframes 0 and 1
   then the frame must have complete splits in uframes 1, 2, 3, 4.

   This means that just like interrupts we cannot use uframes 5 and
   6 to schedule isoc in packets, because that would put the last
   complete split in uframe 0 of the next frame.

   There is an exception in the spec for iso frames which start in
   uframe 0, but that is of no use to us because of the way the EHCI
   controller deals with hostframe / busframe timings means that a complete
   split in uframe 7 also nees to be carried over to the next frame.
   See below...

The above means that for both interrupt and isoc packets we can only
uframes 0 - 4 for scheduling, but this does not take hostframe / busframe
timings into account. The problem is that for a usb 1.1 packet to be send /
received in uframe0, requires the start split packet to be send in uframe7
of the previous frame. This is a pain both software and hardware wise. So
the EHCI controller has the concept of bus frames (what I've been talking
about sofar) and host frames. And a host frame starts at uframe 7 of the
previous bus frame. This is why the ehci code can schedule a start split in
the same uframe as we want the actual usb-1.1 transfer to happen, because
when we set a smask bit for uframe0, we are setting it in host uframe0,
which is uframe7 of the previous bus frame.

This does mean however that we cannot schedule a complete split in *bus*
uframe7, as that is host uframe0 of the next frame (oops).

So this means that we loose another uframe as we do not support scheduling
complete splits for the next host frame. And now we can only use uframes 0-3
for scheduling, loosing a significant chunk of bandwidth.

This also means that complete splits must not be scheduled in the uframe
after the one which we have scheduled the bandwidth in, but one later. Since
If we schedule for bus uframe0, this means we're sending a start split in
host uframe0, having the actual data transfer in host uframe1 and thus
should schedule the complete split in host uframe2 (which is bus uframe1).

Looking at the code again:

1) The interrupt code gets this pretty much wrong. It allows usage
   of uframes 0-5, while it should only allow uframes 0-3, also it
   calculates the cmask wrong! It starts the complete splits for an
   interrupt transfer whose start split is in uframe0 in uframe1, where
   it should start them in uframe2.

2) The isoc code gets this almost right, it has a check to see if complete
   splits are located in the next *host* frame (so bus uframe7 or later)
   and then denies the schedule. This is enforced by this check in
   ehci-sched.c: sitd_slot_ok():

        /* for IN, don't wrap CSPLIT into the next frame */
        if (mask & ~0xffff)
                return 0;

   But it fails to do this check when calulating the initial mask in
   ehci-sched.c: iso_stream_init():

                        /* c-mask as specified in USB 2.0 11.18.4 3.c */
                        tmp = (1 << (hs_transfers + 2)) - 1;
                        stream->raw_mask |= tmp << (8 + 2);

   Where stream->raw_mask is an u16, resulting in excess bits being thrown
   away. I think this was done intentionally to build in the exception in
   the spec where isoc transfers which start in bus uframe0, do not need any
   complete splits in bus uframe0 of the next frame, even though the usual
   rules say the do. However as stated before this exception is of no use to
   us as we do not support crossing *host* frame boundaries with complete
   splits, and the *bus* frame exception does not allow us to also drop
   the complete split in bus uframe7, which we are doing here by throwing
   away any bits above bit nummer 15.

   I've seen this cause misfunction of some 1.1 webcams when used through a
   2.0 hub. When I lower their altsetting to one of 752 bytes or lower,
   meaning that their isoc packets only span bus uframe0-3 (assuming they
   are the only user of the bus), the problems with these webcams goes
   away (at the cost of a lower framerate).

###

So taking a moment pause here, the above means that we need to make
a number of changed to ehci-sched.c to tighten various checks. This may
break some current cases where a high amount of bandwidth is needed and
the missing complete splits are not an issue. My testing shows that some
1.1 webcams will work fine with a packet size of 850-900 through a 2.0
hub despite the missing complete split in bus uframe7, they seem to
simply be quick enough (not using the maximum response times in the 1.1
spec) to have all their data send before uframe7.

Still tightening the checks is the right thing to do, as it will allow
devices which currently do not work, but can fallback to a lower alt
setting to work, and following the spec is the right thing todo :)

###

Have you already taken that pause yet? Because I'm not done yet I'm
afraid. As said above some webcams use fine working a packet size of
850-900 bytes. But they can use packet sizes up to 1023 bytes and the
truncating of the initial cmask in iso_stream_init() means that asking
for 1023 bytes will not trip the complete splits which span the host
frame boundary check. Yet ehci-sched.c does not allow streaming at
1023 bytes.

The reason that ehci-sched.c does not allow streaming at 1023 bytes is
that the tt_usecs it is calculating for scheduling purposes is wrong.

If you look at max_tt_usecs in ehci-sched.c you will see that only a
small part of the 6th bus uframe can be used and that the 7th bus uframe
cannot be used at all. The reason for this is the best full speed budget
scheduling as explained in 11.18.1 of the USB-2.0 spec. The reason we
cannot use those last 2 uframes is because of 10% reservation for control
and bulk messages (which explains 100 of the circa 225 missing usecs) and
because this takes worst case bitstuffing into account (which explains
the rest).

But we use usb_calc_bus_time() to calculate the time a packet will
use, which also takes worst case bit stuffing into account. So we account
for bitstuffing twice, which cause us to be unable to schedule 1023 byte
isoc packets, even though we should be able to (if we forget about the
frame boundary crossing issues).

The USB-2.0 spec clearly state that when budgetting for the TT we should
use a best case speed of 188 bytes / uframe, rather then the worst case
one, this is taken into account in the non availability of uframe 6 and 7
for scheduling. I think the reason for this is that the tt must be fed at
a speed suitable for the best case scenario so that it does not run out
of data to send to the usb-1.1 device.

So we will need to add a new ehci-sched.c private usb_calc_tt_bus_time()
which is a copy of usb_calc_bus_time() with the bit stuffing factor
removed.

Also the usec budget which we have in max_tt_usecs for uframe6 is wrong,
it says 30 usecs, but the spec talks about 30 (well 29 really) bytes not
usecs. Which translates to 20 usecs.

###

I'm afraid I'm still not done yet. Feel free to get yourself another
cup of coffee :)

In tt_available() the following check is done:

                /* special case for isoc transfers larger than 125us:
                 * the first and each subsequent fully used uframe
                 * must be empty, so as to not illegally delay
                 * already scheduled transactions
                 */
                if (125 < usecs) {
                        int ufs = (usecs / 125);
                        int i;
                        for (i = uframe; i < (uframe + ufs) && i < 8; i++)
                                if (0 < tt_usecs[i]) {
                                        ehci_vdbg(ehci,
                                                "multi-uframe xfer can't fit"
                                                "in frame %d uframe %d\n",
                                                frame, i);
                                        return 0;
                                }
                }

This code is too lax. Also the comment about "illegally delay"-ing is a bit
strange. I believe that this code is meant to avoid the following scenario
(in bus uframes)

1) uframe0 has an int or isoc transfer scheduled taking 50usecs,
   lets call this transfer a
2) tt_available gets called for a new multi uframe spanning isoc transer (b)
   starting in uframe0 of 150usecs
3) all other uframes are empty so there is plenty of space, still
   tt_available should return false for scheduling this transfer in uframe0
   as isoc transfers are added to the head of the queue for this frame,
   so if allowed the schedule will look like this

   uframe0 usecs 0 - 74 First 75 usecs of transfer b
   uframe0 usecs 75-125 All 50 usecs of transfer a
   uframe1 usecs 0 - 74 Last 75 usecs of transfer b

   Notice how transfer b is cut into 2, not good.

The above code will successfully protect against the above scenario, but
is easily beaten:

1) uframe0 has an int or isoc transfer scheduled taking 50usecs,
   lets call this transfer a
2) tt_available gets called for a new multi uframe spanning isoc transer (b)
   starting in uframe0 of 100usecs
3) all other uframes are empty so there is plenty of space, still
   tt_available should return false for scheduling this transfer in uframe0
   as isoc transfers are added to the head of the queue for this frame,
   so if allowed the schedule will look like this

   uframe0 usecs 0 - 74 First 75 usecs of transfer b
   uframe0 usecs 75-124 All 50 usecs of transfer a
   uframe1 usecs 0 - 24 Last 25 usecs of transfer b

   Notice how transfer b is still cut into 2

In this case the above code will not trigger,

Another problem is that the last part of a multi uframe transfer does
not get protected against getting pushed later into the uframe:

1) tt_available gets called for a new multi uframe spanning isoc transer (a)
   starting in uframe0 of 150usecs, this will use uframe0 completely and
   part of uframe1

2) tt_available gets called for a new isoc transer (b) of 50 usecs,
   starting in uframe1

3) There is space in uframe1, still tt_available should return false for
   scheduling this transfer in uframe0 as isoc transfers are added to the
   head of the queue for this frame, so if allowed the schedule will look
   like this:

   uframe0 usecs 0 - 124 First 125 usecs of transfer a
   uframe0 usecs 0 -  49 All 50 usecs of transfer b
   uframe1 usecs 50 - 74 Last 25 usecs of transfer a

   Notice how transfer a is cut into 2

This is what I mean with the check is too lax. Now we can easily make the
above code catch the second scenario. Note that when extensing the check
we should not limit it to isoc transfers. Given our always do interrupt
transfers last in the uframe scheduling we do *not* want to split
interrupt transfers either (also the interrupt cmask setting code
does not take splitting into account). So any time adding the transfer to
the current uframe will overflow it and cause the transfer to be split,
tt_available should return false, unless the current uframe is empty (
which means we will never split interrupt transfers as they always fit into
an empty uframe).

But catching the 3th scenario is not so simple. And not allowing scheduling
anything into the same uframe as the last uframe of a multi uframe isoc
transer makes our already not so flexible scheduling even more inflexible and
waste large parts of microframes when it encounters multi microframe spanning
transfers.

The 3th scenario can more easily and better be fixed by putting 1.1 isoc
transfers after already present 1.1 isoc transfers in the same frame
(interrupt transfers are already kept at the end of the per frame transfer
queue).

Note that once we append 1.1 isoc transfers to the queue wrt already
present isoc transfers that we then can start a multipart isoc
transfer safely in a uframe which already has isoc transfers
(but not int!) transfers scheduled, leading to more efficient scheduling.
This does mean that the cmask calculation then needs to be moved to
tt_available as the number of splits won't be known until scheduled.

###

Phew that was long. I hope I will get some feedback on this as that is the
main reason for me writing this :)

The second reason for writing this is to have something to point to for the
why of a number of patches I plan to create and submit soon fixing all of
the above issues.

Once all of the above issues are fixed, things may all be to the spec, but
limiting all transfers to uframe0-3 is not good, so I plan to work on FSTN
support as time allows. Atleast for isoc transfers as doing FSTN support
for isoc transfers is something which I think can be implemented in a sane
manner (unlike interrupt transfer FSTN stuff).

When we do this (add FSTN support for isoc but not for interrupt transfers)
it might be a good idea to reserver the 1st uframe for interrupt transfers
(except when a really large isoc request arrives).

Regards,

Hans

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html