Patch "tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited" has been added to the 4.19-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited

to the 4.19-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tcp-fix-tcp_cwnd_validate-to-not-forget-is_cwnd_limi.patch
and it can be found in the queue-4.19 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit fcc0fb148dc4083b60ce7c06b5f5ec8cc858bdbf
Author: Neal Cardwell <ncardwell@xxxxxxxxxx>
Date:   Wed Sep 28 16:03:31 2022 -0400

    tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
    
    [ Upstream commit f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 ]
    
    This commit fixes a bug in the tracking of max_packets_out and
    is_cwnd_limited. This bug can cause the connection to fail to remember
    that is_cwnd_limited is true, causing the connection to fail to grow
    cwnd when it should, causing throughput to be lower than it should be.
    
    The following event sequence is an example that triggers the bug:
    
     (a) The connection is cwnd_limited, but packets_out is not at its
         peak due to TSO deferral deciding not to send another skb yet.
         In such cases the connection can advance max_packets_seq and set
         tp->is_cwnd_limited to true and max_packets_out to a small
         number.
    
    (b) Then later in the round trip the connection is pacing-limited (not
         cwnd-limited), and packets_out is larger. In such cases the
         connection would raise max_packets_out to a bigger number but
         (unexpectedly) flip tp->is_cwnd_limited from true to false.
    
    This commit fixes that bug.
    
    One straightforward fix would be to separately track (a) the next
    window after max_packets_out reaches a maximum, and (b) the next
    window after tp->is_cwnd_limited is set to true. But this would
    require consuming an extra u32 sequence number.
    
    Instead, to save space we track only the most important
    information. Specifically, we track the strongest available signal of
    the degree to which the cwnd is fully utilized:
    
    (1) If the connection is cwnd-limited then we remember that fact for
    the current window.
    
    (2) If the connection not cwnd-limited then we track the maximum
    number of outstanding packets in the current window.
    
    In particular, note that the new logic cannot trigger the buggy
    (a)/(b) sequence above because with the new logic a condition where
    tp->packets_out > tp->max_packets_out can only trigger an update of
    tp->is_cwnd_limited if tp->is_cwnd_limited is false.
    
    This first showed up in a testing of a BBRv2 dev branch, but this
    buggy behavior highlighted a general issue with the
    tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
    the proper rate for any TCP congestion control, including Reno or
    CUBIC.
    
    Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler")
    Signed-off-by: Neal Cardwell <ncardwell@xxxxxxxxxx>
    Signed-off-by: Kevin(Yudong) Yang <yyd@xxxxxxxxxx>
    Signed-off-by: Yuchung Cheng <ycheng@xxxxxxxxxx>
    Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 1192f1e76015..621ab5a7fb8f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -263,7 +263,7 @@ struct tcp_sock {
 	u32	packets_out;	/* Packets which are "in flight"	*/
 	u32	retrans_out;	/* Retransmitted packets out		*/
 	u32	max_packets_out;  /* max packets_out in last window */
-	u32	max_packets_seq;  /* right edge of max_packets_out flight */
+	u32	cwnd_usage_seq;  /* right edge of cwnd usage tracking flight */
 
 	u16	urg_data;	/* Saved octet of OOB data and control flags */
 	u8	ecn_flags;	/* ECN status bits.			*/
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 436ad4bc9d4c..487b6c5f53f4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1231,11 +1231,14 @@ static inline bool tcp_is_cwnd_limited(const struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 
+	if (tp->is_cwnd_limited)
+		return true;
+
 	/* If in slow start, ensure cwnd grows to twice what was ACKed. */
 	if (tcp_in_slow_start(tp))
 		return tp->snd_cwnd < 2 * tp->max_packets_out;
 
-	return tp->is_cwnd_limited;
+	return false;
 }
 
 /* BBR congestion control needs pacing.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 768a7daab559..e25130812cc8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2607,6 +2607,8 @@ int tcp_disconnect(struct sock *sk, int flags)
 	icsk->icsk_probes_out = 0;
 	tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
 	tp->snd_cwnd_cnt = 0;
+	tp->is_cwnd_limited = 0;
+	tp->max_packets_out = 0;
 	tp->window_clamp = 0;
 	tp->delivered = 0;
 	tp->delivered_ce = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 35bf58599223..8962864223b4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1638,15 +1638,20 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
 	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	/* Track the maximum number of outstanding packets in each
-	 * window, and remember whether we were cwnd-limited then.
+	/* Track the strongest available signal of the degree to which the cwnd
+	 * is fully utilized. If cwnd-limited then remember that fact for the
+	 * current window. If not cwnd-limited then track the maximum number of
+	 * outstanding packets in the current window. (If cwnd-limited then we
+	 * chose to not update tp->max_packets_out to avoid an extra else
+	 * clause with no functional impact.)
 	 */
-	if (!before(tp->snd_una, tp->max_packets_seq) ||
-	    tp->packets_out > tp->max_packets_out ||
-	    is_cwnd_limited) {
-		tp->max_packets_out = tp->packets_out;
-		tp->max_packets_seq = tp->snd_nxt;
+	if (!before(tp->snd_una, tp->cwnd_usage_seq) ||
+	    is_cwnd_limited ||
+	    (!tp->is_cwnd_limited &&
+	     tp->packets_out > tp->max_packets_out)) {
 		tp->is_cwnd_limited = is_cwnd_limited;
+		tp->max_packets_out = tp->packets_out;
+		tp->cwnd_usage_seq = tp->snd_nxt;
 	}
 
 	if (tcp_is_cwnd_limited(sk)) {



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux