Hi All, The last 3 days I've been working on (*) getting a few usb-1.1 webcams to work when connected through a usb-2.0 hub. *) And also doing a lot of reading of ehci-sched.c and the usb 2.0 standard. This mail is an attempt to organize my thoughts, point out what I believe are bugs in the current code and suggest some enhancements. Note the below discusses the CONFIG_USB_EHCI_TT_NEWSCHED code, some may apply to the old code to, some may not. ### When discussing linux interrupt and isoc urb TT scheduling it is important to note that the current ehci code does *NOT* support carrying complete splits over from the current frame to the next frame. This means that: 1) For interrupt packets uframes 5 and 6 can not be used. The current code does allow interrupt packets to use uframe 5, but this is wrong. The spec says that the uframe in which an interrupt packet is scheduled must be followed by *3* complete splits, these 3 would fall in uframe 6, uframe 7 and uframe 0 *of the next frame* Which we do not support, so the check in ehci-sched.c: check_intr_schedule () : if (qh->c_usecs && uframe >= 6) /* FSTN territory? */ Needs to be changed to: if (qh->c_usecs && uframe >= 5) /* FSTN territory? */ Also see the TODO note slightly further down this function 2) For isoc in packets the rules are similar to interrupt rules, except that the isoc in packet can span multiple uframes so the rule is that all uframes following the 1st used uframe until the last used one + 3, must have complete splits. so say an isoc in packet uses uframes 0 and 1 then the frame must have complete splits in uframes 1, 2, 3, 4. This means that just like interrupts we cannot use uframes 5 and 6 to schedule isoc in packets, because that would put the last complete split in uframe 0 of the next frame. There is an exception in the spec for iso frames which start in uframe 0, but that is of no use to us because of the way the EHCI controller deals with hostframe / busframe timings means that a complete split in uframe 7 also nees to be carried over to the next frame. See below... The above means that for both interrupt and isoc packets we can only uframes 0 - 4 for scheduling, but this does not take hostframe / busframe timings into account. The problem is that for a usb 1.1 packet to be send / received in uframe0, requires the start split packet to be send in uframe7 of the previous frame. This is a pain both software and hardware wise. So the EHCI controller has the concept of bus frames (what I've been talking about sofar) and host frames. And a host frame starts at uframe 7 of the previous bus frame. This is why the ehci code can schedule a start split in the same uframe as we want the actual usb-1.1 transfer to happen, because when we set a smask bit for uframe0, we are setting it in host uframe0, which is uframe7 of the previous bus frame. This does mean however that we cannot schedule a complete split in *bus* uframe7, as that is host uframe0 of the next frame (oops). So this means that we loose another uframe as we do not support scheduling complete splits for the next host frame. And now we can only use uframes 0-3 for scheduling, loosing a significant chunk of bandwidth. This also means that complete splits must not be scheduled in the uframe after the one which we have scheduled the bandwidth in, but one later. Since If we schedule for bus uframe0, this means we're sending a start split in host uframe0, having the actual data transfer in host uframe1 and thus should schedule the complete split in host uframe2 (which is bus uframe1). Looking at the code again: 1) The interrupt code gets this pretty much wrong. It allows usage of uframes 0-5, while it should only allow uframes 0-3, also it calculates the cmask wrong! It starts the complete splits for an interrupt transfer whose start split is in uframe0 in uframe1, where it should start them in uframe2. 2) The isoc code gets this almost right, it has a check to see if complete splits are located in the next *host* frame (so bus uframe7 or later) and then denies the schedule. This is enforced by this check in ehci-sched.c: sitd_slot_ok(): /* for IN, don't wrap CSPLIT into the next frame */ if (mask & ~0xffff) return 0; But it fails to do this check when calulating the initial mask in ehci-sched.c: iso_stream_init(): /* c-mask as specified in USB 2.0 11.18.4 3.c */ tmp = (1 << (hs_transfers + 2)) - 1; stream->raw_mask |= tmp << (8 + 2); Where stream->raw_mask is an u16, resulting in excess bits being thrown away. I think this was done intentionally to build in the exception in the spec where isoc transfers which start in bus uframe0, do not need any complete splits in bus uframe0 of the next frame, even though the usual rules say the do. However as stated before this exception is of no use to us as we do not support crossing *host* frame boundaries with complete splits, and the *bus* frame exception does not allow us to also drop the complete split in bus uframe7, which we are doing here by throwing away any bits above bit nummer 15. I've seen this cause misfunction of some 1.1 webcams when used through a 2.0 hub. When I lower their altsetting to one of 752 bytes or lower, meaning that their isoc packets only span bus uframe0-3 (assuming they are the only user of the bus), the problems with these webcams goes away (at the cost of a lower framerate). ### So taking a moment pause here, the above means that we need to make a number of changed to ehci-sched.c to tighten various checks. This may break some current cases where a high amount of bandwidth is needed and the missing complete splits are not an issue. My testing shows that some 1.1 webcams will work fine with a packet size of 850-900 through a 2.0 hub despite the missing complete split in bus uframe7, they seem to simply be quick enough (not using the maximum response times in the 1.1 spec) to have all their data send before uframe7. Still tightening the checks is the right thing to do, as it will allow devices which currently do not work, but can fallback to a lower alt setting to work, and following the spec is the right thing todo :) ### Have you already taken that pause yet? Because I'm not done yet I'm afraid. As said above some webcams use fine working a packet size of 850-900 bytes. But they can use packet sizes up to 1023 bytes and the truncating of the initial cmask in iso_stream_init() means that asking for 1023 bytes will not trip the complete splits which span the host frame boundary check. Yet ehci-sched.c does not allow streaming at 1023 bytes. The reason that ehci-sched.c does not allow streaming at 1023 bytes is that the tt_usecs it is calculating for scheduling purposes is wrong. If you look at max_tt_usecs in ehci-sched.c you will see that only a small part of the 6th bus uframe can be used and that the 7th bus uframe cannot be used at all. The reason for this is the best full speed budget scheduling as explained in 11.18.1 of the USB-2.0 spec. The reason we cannot use those last 2 uframes is because of 10% reservation for control and bulk messages (which explains 100 of the circa 225 missing usecs) and because this takes worst case bitstuffing into account (which explains the rest). But we use usb_calc_bus_time() to calculate the time a packet will use, which also takes worst case bit stuffing into account. So we account for bitstuffing twice, which cause us to be unable to schedule 1023 byte isoc packets, even though we should be able to (if we forget about the frame boundary crossing issues). The USB-2.0 spec clearly state that when budgetting for the TT we should use a best case speed of 188 bytes / uframe, rather then the worst case one, this is taken into account in the non availability of uframe 6 and 7 for scheduling. I think the reason for this is that the tt must be fed at a speed suitable for the best case scenario so that it does not run out of data to send to the usb-1.1 device. So we will need to add a new ehci-sched.c private usb_calc_tt_bus_time() which is a copy of usb_calc_bus_time() with the bit stuffing factor removed. Also the usec budget which we have in max_tt_usecs for uframe6 is wrong, it says 30 usecs, but the spec talks about 30 (well 29 really) bytes not usecs. Which translates to 20 usecs. ### I'm afraid I'm still not done yet. Feel free to get yourself another cup of coffee :) In tt_available() the following check is done: /* special case for isoc transfers larger than 125us: * the first and each subsequent fully used uframe * must be empty, so as to not illegally delay * already scheduled transactions */ if (125 < usecs) { int ufs = (usecs / 125); int i; for (i = uframe; i < (uframe + ufs) && i < 8; i++) if (0 < tt_usecs[i]) { ehci_vdbg(ehci, "multi-uframe xfer can't fit" "in frame %d uframe %d\n", frame, i); return 0; } } This code is too lax. Also the comment about "illegally delay"-ing is a bit strange. I believe that this code is meant to avoid the following scenario (in bus uframes) 1) uframe0 has an int or isoc transfer scheduled taking 50usecs, lets call this transfer a 2) tt_available gets called for a new multi uframe spanning isoc transer (b) starting in uframe0 of 150usecs 3) all other uframes are empty so there is plenty of space, still tt_available should return false for scheduling this transfer in uframe0 as isoc transfers are added to the head of the queue for this frame, so if allowed the schedule will look like this uframe0 usecs 0 - 74 First 75 usecs of transfer b uframe0 usecs 75-125 All 50 usecs of transfer a uframe1 usecs 0 - 74 Last 75 usecs of transfer b Notice how transfer b is cut into 2, not good. The above code will successfully protect against the above scenario, but is easily beaten: 1) uframe0 has an int or isoc transfer scheduled taking 50usecs, lets call this transfer a 2) tt_available gets called for a new multi uframe spanning isoc transer (b) starting in uframe0 of 100usecs 3) all other uframes are empty so there is plenty of space, still tt_available should return false for scheduling this transfer in uframe0 as isoc transfers are added to the head of the queue for this frame, so if allowed the schedule will look like this uframe0 usecs 0 - 74 First 75 usecs of transfer b uframe0 usecs 75-124 All 50 usecs of transfer a uframe1 usecs 0 - 24 Last 25 usecs of transfer b Notice how transfer b is still cut into 2 In this case the above code will not trigger, Another problem is that the last part of a multi uframe transfer does not get protected against getting pushed later into the uframe: 1) tt_available gets called for a new multi uframe spanning isoc transer (a) starting in uframe0 of 150usecs, this will use uframe0 completely and part of uframe1 2) tt_available gets called for a new isoc transer (b) of 50 usecs, starting in uframe1 3) There is space in uframe1, still tt_available should return false for scheduling this transfer in uframe0 as isoc transfers are added to the head of the queue for this frame, so if allowed the schedule will look like this: uframe0 usecs 0 - 124 First 125 usecs of transfer a uframe0 usecs 0 - 49 All 50 usecs of transfer b uframe1 usecs 50 - 74 Last 25 usecs of transfer a Notice how transfer a is cut into 2 This is what I mean with the check is too lax. Now we can easily make the above code catch the second scenario. Note that when extensing the check we should not limit it to isoc transfers. Given our always do interrupt transfers last in the uframe scheduling we do *not* want to split interrupt transfers either (also the interrupt cmask setting code does not take splitting into account). So any time adding the transfer to the current uframe will overflow it and cause the transfer to be split, tt_available should return false, unless the current uframe is empty ( which means we will never split interrupt transfers as they always fit into an empty uframe). But catching the 3th scenario is not so simple. And not allowing scheduling anything into the same uframe as the last uframe of a multi uframe isoc transer makes our already not so flexible scheduling even more inflexible and waste large parts of microframes when it encounters multi microframe spanning transfers. The 3th scenario can more easily and better be fixed by putting 1.1 isoc transfers after already present 1.1 isoc transfers in the same frame (interrupt transfers are already kept at the end of the per frame transfer queue). Note that once we append 1.1 isoc transfers to the queue wrt already present isoc transfers that we then can start a multipart isoc transfer safely in a uframe which already has isoc transfers (but not int!) transfers scheduled, leading to more efficient scheduling. This does mean that the cmask calculation then needs to be moved to tt_available as the number of splits won't be known until scheduled. ### Phew that was long. I hope I will get some feedback on this as that is the main reason for me writing this :) The second reason for writing this is to have something to point to for the why of a number of patches I plan to create and submit soon fixing all of the above issues. Once all of the above issues are fixed, things may all be to the spec, but limiting all transfers to uframe0-3 is not good, so I plan to work on FSTN support as time allows. Atleast for isoc transfers as doing FSTN support for isoc transfers is something which I think can be implemented in a sane manner (unlike interrupt transfer FSTN stuff). When we do this (add FSTN support for isoc but not for interrupt transfers) it might be a good idea to reserver the 1st uframe for interrupt transfers (except when a really large isoc request arrives). Regards, Hans -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html