Ian, I would appreciate if in future you would not copy patch descriptions over from dccp@vger to dccp@ietf. Apart from the fact that I don't like it, this creates the wrong idea among people who have little or nothing to do with actual protocol implementation - it produces an impression of "let's talk about some implementation bugs". (But competent implementation feedback is welcome and solicited on dccp@vger) Which is the more regrettable since you are right in raising this point as a general one: it is indeed a limitation of [RFC 3448, 4.6] with regard to non-realtime OSes. To clarify, the two main issues of this limitation are summarised below. I. Uncontrollable speeds ------------------------ Non-realtime OSes schedule processes in discrete timeslices with a granularity of t_gran = 1/HZ. When packets are scheduled using this mechanism, this naturally limits the maximum packets per second to HZ. There are two speeds involved here: the packet rate `A' of the application (user-space), and the allowed sending rate `X' determined by the TFRC mechanism (kernel-space). These speeds are not related to one another. The allowed sending rate X will, under normal circumstances, approach the link bandwidth; following the principles of slow start. The application sending rate A is fixed or is not. No major problems arise when it is ensured that A is always below X. Numerical example: A=32kbps, X=94Mbps (standard 100 Mb Ethernet link speed). When loss occurs, X is reduced according to p. As long as X remains above A, the sender can send as before; if X is reduced below A, the sender will be limited. Now the problem: when the application rate A is above s * HZ, there is a range of speed where the TFRC mechanism is effectively out of control, i.e. requests to reduce the sending rate in response to ECN-marked packets or congestion events (ECN-marked or lost packets) will not be followed. Numerical example: HZ=1000/sec, X=94Mbps, A=59Mbps, s=1500 bytes. The controllable limit is s * HZ = 1500 * 8 * 1000 bps = 12Mbps. Assume loss occurs in steady-state such that X is to be reduced to X_reduced. Then, if s * HZ < X_reduced <= A, nothing will happen and the effective speed after computing X_reduced will remain at A. This is even more problematic if A is not fixed but could increase above its current rate So, with regard to the numerical example, nothing will happen if X_reduced is between 12Mpbs ... 59Mbps, the speed after the congestion occurs will remain at A=59Mbps. The problem is even more serious when considering that Gigabit NICs are standard in most laptops and desktop PCs, here X will ramp up even higher so that the range for mayhem is even greater. (Standard Linux even comes with 10 Gbit ethernet drivers). Again: the problem is that TFRC/CCID3 can not control speeds above s * HZ on a non-realtime operating system. In car manufacturer terms, this is like a car whose accelerator is functional, but switches to top speed, somewhere in its range. Obviously, they would not be allowed to sell cars with such a deficiency. A safer solution, therefore, would be to insert a throttle into to limit application speeds below s * HZ; to keep applications from stealing bandwidth which they are not supposed to use. II. Accumulation of send credits -------------------------------- This second problem is also conceptual and is described as accumulation of send credits. It has been discussed on this list before, please refer to those threads for a more detailed description of how this comes about. The relevant point here is that accumulation of send credits will also happen as a natural consequence of using [RFC 3448, 4.6] on non-realtime operating systems. The reason is that the use of discrete time slices leads to a quantisation problem, where t_nom is always set earlier than would be required by the exact formula: 0.9 msec becomes 0 msec, 1.7 msec becomes 1 msec, 2.8 msec becomes 2 msec and so forth (this assumes HZ=1000, it is even worse with lower values of HZ). Thus, after a few packets, the sender will be "too early" by the sum total of quantisation errors that have so far occurred. In the given numerical example, the sender is skewed by (0.9 + 0.7 + 0.8) msec = 2.4 msec, which will be broken into a send credit of 2 msec plus a remainder of 0.4 msec; which might clear at a later stage. In addition, this will lead to speeds which are typically faster than allowed by the exact value of t_nom: measurements have shown that in the ``linear'' range of speeds below s * HZ, the real implementation is more than 3 times faster than allowed by the sending rate X = s/t_ipi. III. Accumulation of inaccuracies --------------------------------- Due to context switch latencies, interrupt handling, and processing overhead, a scheduling-based packet pacing will not schedule packets at the exact time, they may be sent slightly earlier or later. This is another source where send credits can accumulate, but it is not fully understood yet. It would require measurements to see how far off on average the scheduling is. It does seem however that this problem is less serious than I/II; scheduling inaccuracies might cancel each other out over the long term. NOTE: numerical examples serve to illustrate the principle only. Please do not interpret this as an invitation for discussion of numerical examples. Thanks. | On 4/18/07, Lars Eggert <lars.eggert@xxxxxxxxx> wrote: | > On 2007-4-18, at 19:16, ext Colin Perkins wrote: | > > On 11 Apr 2007, at 23:45, Ian McDonald wrote: | > >> On 4/12/07, Gerrit Renker <gerrit@xxxxxxxxxxxxxx> wrote: | > >>> There is no way to stop a Linux CCID3 sender from ramping X up to | > >>> the link bandwidth of 1 Gbit/sec; but the scheduler can only | > >>> control packet pacing up to a rate of s * HZ bytes per second. | > >> | > >> Let's start to think laterally about this. Many of the problems | > >> around | > >> CCID3/TFRC implementation seem to be on local LANs and rtt is less | > >> than t_gran. We get really badly affected by how we do x_recv etc and | > >> the rate is basically all over the show. We get affected by send | > >> credits and numerous other problems. | > > | > > As a data point, we've seen similar stability issues with our user- | > > space TFRC implementation, although at somewhat larger RTTs (order | > > of a few milliseconds or less). We're still checking whether these | > > are bugs in our code, or issues with TFRC, but this may be a | > > broader issue than problems with the Linux DCCP implementation. | > | > I think Vlad saw similar issues with the KAME code when running over | > a local area network. (Vlad?) | > | > Lars | > | > | > | > | - | To unsubscribe from this list: send the line "unsubscribe dccp" in | the body of a message to majordomo@xxxxxxxxxxxxxxx | More majordomo info at http://vger.kernel.org/majordomo-info.html | | - To unsubscribe from this list: send the line "unsubscribe dccp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html