Quoting Eddie Kohler: | > Fix: | > ---- | > Avoid any backlog of sending time which is greater than one whole t_ipi. This | > permits the coarse-granularity bursts mentioned in [RFC 3448, 4.6], but disallows | > the disproportionally large bursts. | | Actually this does not permit coarse granularity bursts, since it limits | the maximum burst size to 2 packets. That is not sufficient for high | rates and medium-to-low granularities and it is far stricter than TCP. | The comment affects the commit message. I can change that if you like. With regard to the remainder: First is the issue with TCP. As shown below, increasing the allowed lag beyond one full t_ipi will effectively increase the sending rate beyond the allowed rate X; which means that the sender sends more per RTT than it is allowed by the throughput equation. With regards to stricter, we do respect RFC 4340, 3.6, `DCCP implementations will follow TCP's "general principle of robustness": "be conservative in what you do, be liberal in what you accept from others" [RFC793].' Finally, the main reason for using a tighter value on the maximum lag is to protect against problems with high-speed hardware. Commodity PCs already have Gigabit ethernet cards and the Linux stack nicely scales up to speed. Unfortunately, unless one implements real-time extensions to pace the packets, there will always be slack and accumulation of send credits. And these will accrue for the simple reason that a t_ipi of 1.6 milliseconds becomes 1 millisecond, and a t_ipi of 0.9 milliseconds becomes 0 milliseconds. There is no way to stop a Linux CCID3 sender from ramping X up to the link bandwidth of 1 Gbit/sec; but the scheduler can only control packet pacing up to a rate of s * HZ bytes per second. Therefore, if we allow slack in the scheduling lag, the bursts on such systems as use Gbit or even 10-Gbit ethernet cards will become astronomically large. It is thus safer to choose the more restrictive value. Of course, a regrettable compromise. But to do the scheduling right _and_ safe requires real-time extensions or busy-wait threads (not sure that they will find much favour). The same topic has been discussed several times over on this mailing list. C o n c l u s i o n : ===================== The patch fixes a serious problem which will occur in any application using CCID3, due to realistically possible conditions such as * a low sending rate and/or * silence periods and/or * scheduling inaccuracies (as described above). I therefore still want it in! | | > D e t a i l e d J u s t i f i c a t i o n [not commit message] | > ------------------------------------------------------------------ | > Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where | > n is a natural number and t_r < t_ipi. Then | > | > t_nom - t_now = - (n*t_ipi + t_r) | > | > First consider n=0: the current packet is sent immediately, and for | > the next one the send time is | > | > t_nom' = t_nom + t_ipi = t_now + (t_ipi - t_r) | > | > Thus the next packet is sent t_r time units earlier. The result is | > burstier traffic, as the inter-packet spacing is reduced; this | > burstiness is mentioned by [RFC 3448, 4.6]. | > | > Now consider n=1. This case is illustrated below | > | > |<----- t_ipi -------->|<-- t_r -->| | > | > |----------------------|-----------| | > t_nom t_now | > | > Not only can the next packet be sent t_r time units earlier, a third | > packet can additionally be sent at the same time. | > | > This case can be generalised in that the packet scheduling mechanism | > now acts as a Token Bucket Filter whose bucket size equals n: when | > n=0, a packet can only be sent when the next token arrives. When n>0, | > a burst of n packets can be sent immediately in addition to the tokens | > which arrive with rate rho = 1/t_ipi. | > | > The aim of CCID 3 is an on average smooth traffic with allowed sending | > rate X. The following determines the required bucket size n for the | > purpose of achieving, over the period of one RTT R, an average allowed | > sending rate X. | > The number of bytes sent during this period is X*R. Tokens arrive with | > rate rho at the bucket, whose size n shall be determined now. Over the | > period of R, the TBF allows s * (n + R * rho) bytes to be sent, since | > each token represents a packet of size s. Hence we have the equation | > | > s * (n + R * rho) = X * R | > <=> n + R/t_ipi = X/s * R = R / t_ipi | > | > which shows that n must be 0. Hence we can not allow a `credit' of | > t_nom - t_now > t_ipi time units to accrue in the packet scheduling. | > | > | > Signed-off-by: Gerrit Renker <gerrit@xxxxxxxxxxxxxx> | > --- | > net/dccp/ccids/ccid3.c | 12 ++++++++++-- | > 1 file changed, 10 insertions(+), 2 deletions(-) | > | > --- a/net/dccp/ccids/ccid3.c | > +++ b/net/dccp/ccids/ccid3.c | > @@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struc | > case TFRC_SSTATE_NO_FBACK: | > case TFRC_SSTATE_FBACK: | > delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now); | > - ccid3_pr_debug("delay=%ld\n", (long)delay); | > + /* | > + * Lagging behind for more than a full t_ipi: when this occurs, | > + * a send credit accrues which causes packet storms, violating | > + * even the average allowed sending rate. This case happens if | > + * the application idles for some time, or if it emits packets | > + * at a rate smaller than X/s. Avoid such accumulation. | > + */ | > + if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi < 0) | > + hctx->ccid3hctx_t_nom = now; | > /* | > * Scheduling of packet transmissions [RFC 3448, 4.6] | > * | > @@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struc | > * else | > * // send the packet in (t_nom - t_now) milliseconds. | > */ | > - if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0) | > + else if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0) | > return delay / 1000L; | > | > ccid3_hc_tx_update_win_count(hctx, &now); | > - | > To unsubscribe from this list: send the line "unsubscribe dccp" in | > the body of a message to majordomo@xxxxxxxxxxxxxxx | > More majordomo info at http://vger.kernel.org/majordomo-info.html | | - To unsubscribe from this list: send the line "unsubscribe dccp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html