There have been discussions on whether `s' is fixed in CCID 2/3 or whether, in spite of RFC 4341/2, `s' is variable. To resolve the discussion into material which is practically implementable and yet does not defy existing standards-track documents, a simple algorithmic strategy is suggested below. An aside to the discussion was the PACKET_SIZE socket option, but in actual fact this option is of no use other than causing unnecessary confusion: for CCID 2 it is entirely irrelevant, and for CCID 3 (and its variations) it is redundant and not useful due to argumentation below. Strategy ---------- 1/ Remove the PACKET_SIZE socket options as they don't help with the problem; I have therefore updated Ian's patch to be used standalone [attached]. 2/ In the initialisation code, set ccid3hctx_s = ccid3hcrx_s = 0 instead of the TFRC_STD_PACKET_SIZE (which would give wrong estimates) 3/ Update ccid3hc{rx,tx}_s using the `payload len' value taken each time from * dccp_sendmsg (after the check against dccp_mss_cache) * dccp_v{4,6}_rcv (only on Data/DataAck packets) using a moving average with a large q = 8/10 ... 9/10: s <-- q * s + (1-q) * len 4/ Don't use MTU minus header length for `s' (justified below). Thus, if the application is well-behaved and sends only `fixed' size packets, step (3) reduces to a no-op. Otherwise, it slowly converges to a long-term value. The exact/optimal weight of q can be found through experimentation, but 9/10 seems conservative enough. The algorithm above is in accordance with recent modifications suggested in [HFPW06]. It could be extended to guard against excessive changes in packet length for CCID 2 as suggested in [RFC 4341, sec. 5.3]. R A T I O N A L E The problem in assuming that `s' may vary and in allowing it to be set to some other (but fixed) value, such as the path MTU minus header/option lengths, lies in required changes to the loss rate estimation algorithm. References which explicitly warn against this are given below; in both [Wid00, p. 21] and [FHP+00, 3.1.2] it is pointed out that this part has taken much discussion and testing; for good reasons, since any changes endanger both efficiency and fairness wrt competing TCP flows. Theorems and numerical examples that attest that inaccuracies lead to either non-TCP-friendly or suboptimal application behaviour can be found in [RR99]; trends to be confirmed later by a much more comprehensive analysis in [VLB05]. These findings were validated and confirmed by Widmer et al in [WBLB04]. This article, as well as the earlier technical report [Vas00] warn against using the MTU as `fixed' packet size parameter of the throughput equation, in such scenarios where the application is allowed to send variable-sized packets. To solve the problem of a non-`fixed' s, Widmer et al introduce a number of changes to the loss estimation algorithm in [WBLB04]. Similar ideas and a confirmation that the loss interval estimation needs updating when `s' may vary can be found in [FK06]; where a constant of s=1460 is plugged into the throughput equation. Consequently adjustments to estimating the loss event rate do follow; and an upper bound on the sending rate is additionally imposed to support using non-`fixed' s. In summary, using variable packet sizes is not well understood and even less well specified. There are several publications which explicitly warn against clamping `s' to the path MTU [Vas00, WBLB04,VLB05, RR99] and thereby allowing applications to be liberal with (the length of) what they send. References -------------- [RR99] Ramesh, Sridhar and Injong Rhee. Issues in TCP Model-Based Flow Control. Technical report, TR-99-15, NCSU, North Carolina State University, Raleigh, 1999. [VLB05] Vojnovic, Milan and Jean-Yves Le Boudec. On the long-run behavior of equation-based rate control. IEEE/ACM Transactions on Networking (TON), 13(3):568--581, 6/2005. [WBLB04] Widmer, Jörg, Catherine Boutremans and Jean-Yves Le Boudec. End-to-End Congestion Control for TCP-Friendly Flows with Variable Packet Size. ACM SIGCOMM Computer Communication Review, 34(2):137--151, 4/2004. [Vas00] Vasallo, Pedro Reviriego. Variable Packet Size Equation Based Congestion Control. Technical Report, tr-00-008, ICSI, 4/2000. [FK06] Floyd, Sally and Eddie Kohler. TCP Friendly Rate Control (TFRC): the Small-Packet (SP) Variant. draft-ietf-dccp-tfrc-voip-05.txt, 1/3/2006. [HFPW06] draft-floyd-rfc3448bis-00.txt [Wid00] Widmer, Jörg. Equation-Based Congestion Control. Diploma Thesis, Department of Mathematics and Computer Science, University of Mannheim, Germany, 2/2000. [FHP+00] Floyd, Sally, Mark Handley, Jitendra Padhye and Jörg Widmer. Equation-Based Congestion Control for Unicast Applications. ACM SIGCOMM Computer Communication Review, 30(4):43--56, 10/2000.
diff --git a/include/linux/dccp.h b/include/linux/dccp.h index d6f4ec4..628035f 100644 --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -196,7 +196,6 @@ struct dccp_so_feat { }; /* DCCP socket options */ -#define DCCP_SOCKOPT_PACKET_SIZE 1 #define DCCP_SOCKOPT_SERVICE 2 #define DCCP_SOCKOPT_CHANGE_L 3 #define DCCP_SOCKOPT_CHANGE_R 4 @@ -464,7 +463,6 @@ struct dccp_sock { struct dccp_service_list *dccps_service_list; struct timeval dccps_timestamp_time; __u32 dccps_timestamp_echo; - __u32 dccps_packet_size; __u16 dccps_l_ack_ratio; __u16 dccps_r_ack_ratio; unsigned long dccps_ndp_count; diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index cec23ad..aa8f19e 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -652,11 +652,7 @@ static int ccid3_hc_tx_init(struct ccid struct dccp_sock *dp = dccp_sk(sk); struct ccid3_hc_tx_sock *hctx = ccid_priv(ccid); - if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && - dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) - hctx->ccid3hctx_s = dp->dccps_packet_size; - else - hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE; + hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE; /* Set transmission rate to 1 packet per second */ hctx->ccid3hctx_x = hctx->ccid3hctx_s; @@ -1125,11 +1121,7 @@ static int ccid3_hc_rx_init(struct ccid ccid3_pr_debug("%s, sk=%p\n", dccp_role(sk), sk); - if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE && - dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE) - hcrx->ccid3hcrx_s = dp->dccps_packet_size; - else - hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE; + hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE; hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist); diff --git a/net/dccp/proto.c b/net/dccp/proto.c index d3e6e81..69ba5c3 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -451,8 +451,7 @@ out_free_val: static int do_dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { - struct dccp_sock *dp; - int err; + int err = 0; int val; if (optlen < sizeof(int)) @@ -465,14 +464,8 @@ static int do_dccp_setsockopt(struct soc return dccp_setsockopt_service(sk, val, optval, optlen); lock_sock(sk); - dp = dccp_sk(sk); - err = 0; switch (optname) { - case DCCP_SOCKOPT_PACKET_SIZE: - dp->dccps_packet_size = val; - break; - case DCCP_SOCKOPT_CHANGE_L: if (optlen != sizeof(struct dccp_so_feat)) err = -EINVAL; @@ -568,10 +561,6 @@ static int do_dccp_getsockopt(struct soc dp = dccp_sk(sk); switch (optname) { - case DCCP_SOCKOPT_PACKET_SIZE: - val = dp->dccps_packet_size; - len = sizeof(dp->dccps_packet_size); - break; case DCCP_SOCKOPT_SERVICE: return dccp_getsockopt_service(sk, len, (__be32 __user *)optval, optlen);