Re: regarding delayed ACK

Michael Blizek <michi1@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 18 Apr 2008 18:09:11 +0200

oOn 20:37 Wed 16 Apr     , Mulyadi Santosa wrote:
> Hi...
> 
> Reading this...
> 
> http://www.stuartcheshire.org/papers/NagleDelayedAck/

Interesting, but some things are weird:

>> If you receive a single lone TCP segment, you wait 100-200ms, on the
>> assumption that the receiving application will probably generate a response
>> of some kind. (E.g., every time sshd receives a keystroke, it typically
>> generates a character echo in response.) You don't want the TCP stack to
>> send an empty ACK followed by a TCP data packet 1ms later, every time, so
>> you delay a little, so you can combine the ACK and data packet into one.

In every TCP packet there is a "push" flag. It will be set, when the sender
has no more data to sent. Then the receiver sends the ACK immediatly. The
delayed ACKs are probably for preventing a flood of unneccesary ACKs.

>> The next thing to know is that Delayed ACK applies to a single packet. If a
>> second packet arrives, an ACK is generated immediately. So TCP will ACK
>> every second packet immediately. Send two packets, and you get an immediate
>> ACK. Send three packets, and you'll get an immediate ACK covering the first
>> two, then a 200ms pause before the ACK for the third.

Is there a RFC or any particular reason for doing it this way? I have seen this
in Linux,too. But it makes no sense to me. There is a timer which schedules the
delayed ACK. If the next packet is expected to arrive (time between segments is
stored in "ato") after the delayed ACK would have been sent, the delayed ACK
gets sent immediatly. This avoids a timer event. If the next packet is expected
soon, and there is already a delayed ACK, the timeout is the timeout of the
already queued ACK (it must not be delayed any further). If there is not, the
delayed ACK should timeout after a fraction of the RTT (=ping time). After
2xRTT the sender sends a retransmit. So we have to delay for max. 1xRTT minus
the line jitter and the schedeled delay. I have no clue what the code in
tcp_output.c/tcp_send_delayed_ack does. Maybe it is broken and never gets
triggered because of an ACK gets sent every 2 packets anyway...

I have once made a patch for 2.6.26.5 which does what I think it should do. It
reduces the number of ACKs to a froction. The only penalty seemed to be a 
somewhat lower throughput on large bandwidth-delay networks when the congestion
window is the bottleneck. But I have not tested anything concerning latency. If
you are inerested, here is the patch:

diff -Nur linux-2.6.16.5/net/ipv4/tcp.c linux/net/ipv4/tcp.c

--- linux-2.6.16.5/net/ipv4/tcp.c	2006-04-12 22:27:57.000000000 +0200
+++ linux/net/ipv4/tcp.c	2006-04-15 12:53:11.000000000 +0200
@@ -953,7 +953,7 @@
 		    * receive. */
 		if (icsk->icsk_ack.blocked ||
 		    /* Once-per-two-segments ACK was not sent by tcp_input.c */
-		    tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
+		    //tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
 		    /*
 		     * If this read emptied read buffer, we send ACK, if
 		     * connection is not bidirectional, user drained
diff -Nur linux-2.6.16.5/net/ipv4/tcp_input.c linux/net/ipv4/tcp_input.c
--- linux-2.6.16.5/net/ipv4/tcp_input.c	2006-04-12 22:27:57.000000000 +0200
+++ linux/net/ipv4/tcp_input.c	2006-04-15 12:18:43.000000000 +0200
@@ -3567,11 +3567,12 @@
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	    /* More than one full frame received... */
-	if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss
+	if (
+		//((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss
 	     /* ... and right edge of window advances far enough.
 	      * (tcp_recvmsg() will send ACK otherwise). Or...
 	      */
-	     && __tcp_select_window(sk) >= tp->rcv_wnd) ||
+	     //&& __tcp_select_window(sk) >= tp->rcv_wnd) ||
 	    /* We ACK each frame or... */
 	    tcp_in_quickack_mode(sk) ||
 	    /* We have out of order data. */
diff -Nur linux-2.6.16.5/net/ipv4/tcp_output.c linux/net/ipv4/tcp_output.c
--- linux-2.6.16.5/net/ipv4/tcp_output.c	2006-04-12 22:27:57.000000000 +0200
+++ linux/net/ipv4/tcp_output.c	2006-04-15 15:01:40.000000000 +0200
@@ -1967,51 +1967,25 @@
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int ato = icsk->icsk_ack.ato;
-	unsigned long timeout;
-
-	if (ato > TCP_DELACK_MIN) {
-		const struct tcp_sock *tp = tcp_sk(sk);
-		int max_ato = HZ/2;
-
-		if (icsk->icsk_ack.pingpong || (icsk->icsk_ack.pending & ICSK_ACK_PUSHED))
-			max_ato = TCP_DELACK_MAX;
-
-		/* Slow path, intersegment interval is "high". */
-
-		/* If some rtt estimate is known, use it to bound delayed ack.
-		 * Do not use inet_csk(sk)->icsk_rto here, use results of rtt measurements
-		 * directly.
-		 */
-		if (tp->srtt) {
-			int rtt = max(tp->srtt>>3, TCP_DELACK_MIN);
-
-			if (rtt < max_ato)
-				max_ato = rtt;
-		}
-
-		ato = min(ato, max_ato);
-	}
-
-	/* Stay within the limit we were given */
-	timeout = jiffies + ato;
+	const struct tcp_sock *tp = tcp_sk(sk);
 
 	/* Use new timeout only if there wasn't a older one earlier. */
-	if (icsk->icsk_ack.pending & ICSK_ACK_TIMER) {
-		/* If delack timer was blocked or is about to expire,
-		 * send ACK now.
-		 */
-		if (icsk->icsk_ack.blocked ||
-		    time_before_eq(icsk->icsk_ack.timeout, jiffies + (ato >> 2))) {
-			tcp_send_ack(sk);
-			return;
-		}
-
-		if (!time_before(timeout, icsk->icsk_ack.timeout))
-			timeout = icsk->icsk_ack.timeout;
+	if (!(icsk->icsk_ack.pending & ICSK_ACK_TIMER)) {
+		icsk->icsk_ack.timeout = jiffies + (tp->srtt>>4);
+		
+	}
+	
+	/* If delack timer was blocked or is about to expire,
+	* send ACK now.
+	*/
+	if (icsk->icsk_ack.blocked ||
+		time_before_eq(icsk->icsk_ack.timeout, jiffies + (ato >> 2))) {
+		tcp_send_ack(sk);
+		return;
 	}
+	
 	icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
-	icsk->icsk_ack.timeout = timeout;
-	sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout);
+	sk_reset_timer(sk, &icsk->icsk_delack_timer, icsk->icsk_ack.timeout);
 }
 
 /* This routine sends an ack and also updates the window. */






> I am thinking, does this implicitly means, on every OS that implements
> delayed ACK, i better send small data but continously, rather than big
> one but intermittently?

No. The article says:
1) Sent your requests as fast as possible one after another even when you do
   not get any response.
2) Call send() with a big buffer. Then you may disable TCP_NODELAY.
 
> And, what about the side effect of dynamic window size? in reality,
> window size is changed many times, so I think we might hit a case
> (likely) that many of of packed gets merged into one...building bigger
> packet than we expect..

The window size has *nothing* to do with the packet size. The stack should
not send a packet which is smaller than the maximum size if there is enough
data which should be sent. Otherwise you may experience the "silly window
syndrom".
	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.homelinux.net



--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ