PACKET_SIZE option and congestion control on variable-length packets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There have been discussions on whether `s' is fixed in CCID 2/3 or whether,
in spite of RFC 4341/2, `s' is variable. To resolve the discussion into
material which is practically implementable and yet does not defy existing
standards-track documents, a simple algorithmic strategy is suggested below.

An aside to the discussion was the PACKET_SIZE socket option, but in actual
fact this option is of no use other than causing unnecessary confusion: for
CCID 2 it is entirely irrelevant, and for CCID 3 (and its variations) it is
redundant and not useful due to argumentation below. 

Strategy
----------
1/ Remove the PACKET_SIZE socket options as they don't help with the problem;
    I have therefore updated Ian's patch to be used standalone [attached].
2/ In the initialisation code, set ccid3hctx_s = ccid3hcrx_s = 0  instead of the
    TFRC_STD_PACKET_SIZE (which would give wrong estimates)
3/ Update  ccid3hc{rx,tx}_s using the `payload len' value taken each time from
	* dccp_sendmsg (after the check against dccp_mss_cache)
	* dccp_v{4,6}_rcv (only on Data/DataAck packets)
    using a moving average with a large q = 8/10 ... 9/10:
       s  <--  q * s   +  (1-q) * len
4/ Don't use MTU minus header length for `s'  (justified below).

Thus, if the application is well-behaved and sends only `fixed' size packets, step
(3) reduces to a no-op. Otherwise, it slowly converges to a long-term value. The
exact/optimal weight of q can be found through experimentation, but 9/10 seems
conservative enough. The algorithm above is in accordance with recent modifications
suggested in  [HFPW06]. It could be extended to guard against excessive changes in
packet length for CCID 2 as suggested in [RFC 4341, sec. 5.3].

			R A T I O N A L E

The problem in assuming that `s' may vary and in allowing it to be set to
some other (but fixed) value, such as the path MTU minus header/option
lengths, lies in required changes to the loss rate estimation algorithm. 

References which explicitly warn against this are given below; in both 
[Wid00, p. 21] and [FHP+00, 3.1.2] it is pointed out that this part has taken
much discussion and testing; for good reasons, since any changes endanger
both efficiency and fairness wrt competing  TCP flows. 

Theorems and numerical examples that attest that inaccuracies lead to either 
non-TCP-friendly or suboptimal application behaviour can be found in [RR99];
trends to be confirmed later by a much more comprehensive analysis in [VLB05].

These findings were validated and confirmed by Widmer et al in [WBLB04]. This
article, as well as the earlier technical report [Vas00] warn against using the MTU
as `fixed' packet size parameter of the throughput equation, in such scenarios where
the application is allowed to send variable-sized packets. To solve the problem of a
non-`fixed' s,  Widmer et al introduce a number of changes to  the loss estimation
algorithm in [WBLB04]. 

Similar ideas and a confirmation that the loss interval estimation needs updating when `s' may
vary can be found in [FK06]; where a constant of s=1460 is plugged into the throughput equation.
Consequently adjustments  to estimating the loss event rate do follow; and  an upper bound on the
sending rate is additionally imposed to support using non-`fixed' s.

In summary, using variable packet sizes is not well understood and even less well specified. There are
several publications which explicitly warn against clamping `s' to the path MTU [Vas00,  WBLB04,VLB05,
RR99] and thereby allowing applications to be liberal with (the length of) what they send. 


References
--------------
[RR99]       Ramesh, Sridhar and Injong Rhee. Issues in TCP Model-Based Flow
             Control. Technical report, TR-99-15, NCSU, North Carolina State
             University, Raleigh, 1999.

[VLB05]      Vojnovic, Milan and Jean-Yves Le Boudec. On the long-run behavior
             of equation-based rate control. IEEE/ACM Transactions on
             Networking (TON), 13(3):568--581, 6/2005.

[WBLB04]     Widmer, Jörg, Catherine Boutremans and Jean-Yves Le Boudec.
             End-to-End Congestion Control for TCP-Friendly Flows with
             Variable Packet Size. ACM SIGCOMM Computer Communication Review,
             34(2):137--151, 4/2004.

[Vas00]      Vasallo, Pedro Reviriego. Variable Packet Size Equation Based
             Congestion Control. Technical Report, tr-00-008, ICSI, 4/2000.

[FK06]       Floyd, Sally and Eddie Kohler. TCP Friendly Rate Control (TFRC):
             the Small-Packet (SP) Variant. draft-ietf-dccp-tfrc-voip-05.txt,
             1/3/2006.

[HFPW06]  draft-floyd-rfc3448bis-00.txt

[Wid00]      Widmer, Jörg. Equation-Based Congestion Control. Diploma Thesis,
             Department of Mathematics and Computer Science, University of
             Mannheim, Germany, 2/2000.

[FHP+00]     Floyd, Sally, Mark Handley, Jitendra Padhye and Jörg Widmer.
             Equation-Based Congestion Control for Unicast Applications. ACM
             SIGCOMM Computer Communication Review, 30(4):43--56, 10/2000.
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index d6f4ec4..628035f 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -196,7 +196,6 @@ struct dccp_so_feat {
 };
 
 /* DCCP socket options */
-#define DCCP_SOCKOPT_PACKET_SIZE	1
 #define DCCP_SOCKOPT_SERVICE		2
 #define DCCP_SOCKOPT_CHANGE_L		3
 #define DCCP_SOCKOPT_CHANGE_R		4
@@ -464,7 +463,6 @@ struct dccp_sock {
 	struct dccp_service_list	*dccps_service_list;
 	struct timeval			dccps_timestamp_time;
 	__u32				dccps_timestamp_echo;
-	__u32				dccps_packet_size;
 	__u16				dccps_l_ack_ratio;
 	__u16				dccps_r_ack_ratio;
 	unsigned long			dccps_ndp_count;
diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index cec23ad..aa8f19e 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -652,11 +652,7 @@ static int ccid3_hc_tx_init(struct ccid 
 	struct dccp_sock *dp = dccp_sk(sk);
 	struct ccid3_hc_tx_sock *hctx = ccid_priv(ccid);
 
-	if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE &&
-	    dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE)
-		hctx->ccid3hctx_s = dp->dccps_packet_size;
-	else
-		hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE;
+	hctx->ccid3hctx_s = TFRC_STD_PACKET_SIZE;
 
 	/* Set transmission rate to 1 packet per second */
 	hctx->ccid3hctx_x     = hctx->ccid3hctx_s;
@@ -1125,11 +1121,7 @@ static int ccid3_hc_rx_init(struct ccid 
 
 	ccid3_pr_debug("%s, sk=%p\n", dccp_role(sk), sk);
 
-	if (dp->dccps_packet_size >= TFRC_MIN_PACKET_SIZE &&
-	    dp->dccps_packet_size <= TFRC_MAX_PACKET_SIZE)
-		hcrx->ccid3hcrx_s = dp->dccps_packet_size;
-	else
-		hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE;
+	hcrx->ccid3hcrx_s = TFRC_STD_PACKET_SIZE;
 
 	hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
 	INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index d3e6e81..69ba5c3 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -451,8 +451,7 @@ out_free_val:
 static int do_dccp_setsockopt(struct sock *sk, int level, int optname,
 		char __user *optval, int optlen)
 {
-	struct dccp_sock *dp;
-	int err;
+	int err = 0;
 	int val;
 
 	if (optlen < sizeof(int))
@@ -465,14 +464,8 @@ static int do_dccp_setsockopt(struct soc
 		return dccp_setsockopt_service(sk, val, optval, optlen);
 
 	lock_sock(sk);
-	dp = dccp_sk(sk);
-	err = 0;
 
 	switch (optname) {
-	case DCCP_SOCKOPT_PACKET_SIZE:
-		dp->dccps_packet_size = val;
-		break;
-
 	case DCCP_SOCKOPT_CHANGE_L:
 		if (optlen != sizeof(struct dccp_so_feat))
 			err = -EINVAL;
@@ -568,10 +561,6 @@ static int do_dccp_getsockopt(struct soc
 	dp = dccp_sk(sk);
 
 	switch (optname) {
-	case DCCP_SOCKOPT_PACKET_SIZE:
-		val = dp->dccps_packet_size;
-		len = sizeof(dp->dccps_packet_size);
-		break;
 	case DCCP_SOCKOPT_SERVICE:
 		return dccp_getsockopt_service(sk, len,
 					       (__be32 __user *)optval, optlen);

[Index of Archives]     [Linux Kernel]     [IETF DCCP]     [Linux Networking]     [Git]     [Security]     [Linux Assembly]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux