Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday 29 September 2005 11:17 am, Alexey Kuznetsov wrote:
> Hello!
>
> > >Anyway, ignoring this puzzle, the following patch for 2.4 should help.
> > >
> > >
> > >--- net/ipv4/tcp_input.c.orig	2003-02-20 20:38:39.000000000 +0300
> > >+++ net/ipv4/tcp_input.c	2005-09-02 22:28:00.845952888 +0400
> > >@@ -343,8 +343,6 @@
> > >			app_win -= tp->ack.rcv_mss;
> > >		app_win = max(app_win, 2U*tp->advmss);
> > >
> > >-		if (!ofo_win)
> > >-			tp->window_clamp = min(tp->window_clamp, app_win);
> > >		tp->rcv_ssthresh = min(tp->window_clamp, 2U*tp->advmss);
> > >	}
> > >}
> >
> > I'm very happy to report that the above patch, applied to 2.6.12.6, seems
> > to have cured the TCP window problem we were experiencing.
>
> Good. I think the patch is to be applied to all mainstream kernels.

Has anyone looked at the patch I sent out on Sept 9?  It goes a few steps 
further, addressing some additional problems.  Original message below.

Thanks,
  -John

-----

This is a patch for discussion addressing some receive buffer growing issues.  
This is partially related to the thread "Possible BUG in IPv4 TCP window 
handling..." last week.

Specifically it addresses the problem of an interaction between rcvbuf 
moderation (receiver autotuning) and rcv_ssthresh.  The problem occurs when 
sending small packets to a receiver with a larger MTU.  (A very common case I 
have is a host with a 1500 byte MTU sending to a host with a 9k MTU.)  In 
such a case, the rcv_ssthresh code is targeting a window size corresponding 
to filling up the current rcvbuf, not taking into account that the new rcvbuf 
moderation may increase the rcvbuf size.

One hunk makes rcv_ssthresh use tcp_rmem[2] as the size target rather than 
rcvbuf.  The other changes the behavior when it overflows its memory bounds 
with in-order data so that it tries to grow rcvbuf (the same as with 
out-of-order data).

These changes should help my problem of mixed MTUs, and should also help the 
case from last week's thread I think.  (In both cases though you still need 
tcp_rmem[2] to be set much larger than the TCP window.)  One question is if 
this is too aggressive at trying to increase rcvbuf if it's under memory 
stress.

  -John


Signed-off-by: John Heffner <jheffner@xxxxxxx>
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -233,7 +233,7 @@ static int __tcp_grow_window(const struc
 {
 	/* Optimize this! */
 	int truesize = tcp_win_from_space(skb->truesize)/2;
-	int window = tcp_full_space(sk)/2;
+	int window = tcp_win_from_space(sysctl_tcp_rmem[2])/2;
 
 	while (tp->rcv_ssthresh <= window) {
 		if (truesize <= skb->len)
@@ -326,39 +326,18 @@ static void tcp_init_buffer_space(struct
 static void tcp_clamp_window(struct sock *sk, struct tcp_sock *tp)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
-	struct sk_buff *skb;
-	unsigned int app_win = tp->rcv_nxt - tp->copied_seq;
-	int ofo_win = 0;
 
 	icsk->icsk_ack.quick = 0;
 
-	skb_queue_walk(&tp->out_of_order_queue, skb) {
-		ofo_win += skb->len;
+	if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
+	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
+	    !tcp_memory_pressure &&
+	    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
+		sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
+				    sysctl_tcp_rmem[2]);
 	}
-
-	/* If overcommit is due to out of order segments,
-	 * do not clamp window. Try to expand rcvbuf instead.
-	 */
-	if (ofo_win) {
-		if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
-		    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
-		    !tcp_memory_pressure &&
-		    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0])
-			sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
-					    sysctl_tcp_rmem[2]);
-	}
-	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) {
-		app_win += ofo_win;
-		if (atomic_read(&sk->sk_rmem_alloc) >= 2 * sk->sk_rcvbuf)
-			app_win >>= 1;
-		if (app_win > icsk->icsk_ack.rcv_mss)
-			app_win -= icsk->icsk_ack.rcv_mss;
-		app_win = max(app_win, 2U*tp->advmss);
-
-		if (!ofo_win)
-			tp->window_clamp = min(tp->window_clamp, app_win);
+	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
 		tp->rcv_ssthresh = min(tp->window_clamp, 2U*tp->advmss);
-	}
 }
 
 /* Receiver "autotuning" code.

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux