Patch "tcp: fix delayed ACKs for MSS boundary condition" has been added to the 4.19-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    tcp: fix delayed ACKs for MSS boundary condition

to the 4.19-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tcp-fix-delayed-acks-for-mss-boundary-condition.patch
and it can be found in the queue-4.19 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 4854413f84d0a04743d1b8abea50c9dbdd9d39ea
Author: Neal Cardwell <ncardwell@xxxxxxxxxx>
Date:   Sun Oct 1 11:12:39 2023 -0400

    tcp: fix delayed ACKs for MSS boundary condition
    
    [ Upstream commit 4720852ed9afb1c5ab84e96135cb5b73d5afde6f ]
    
    This commit fixes poor delayed ACK behavior that can cause poor TCP
    latency in a particular boundary condition: when an application makes
    a TCP socket write that is an exact multiple of the MSS size.
    
    The problem is that there is painful boundary discontinuity in the
    current delayed ACK behavior. With the current delayed ACK behavior,
    we have:
    
    (1) If an app reads data when > 1*MSS is unacknowledged, then
        tcp_cleanup_rbuf() ACKs immediately because of:
    
         tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
    
    (2) If an app reads all received data, and the packets were < 1*MSS,
        and either (a) the app is not ping-pong or (b) we received two
        packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
        of:
    
         ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
          ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
           !inet_csk_in_pingpong_mode(sk))) &&
    
    (3) *However*: if an app reads exactly 1*MSS of data,
        tcp_cleanup_rbuf() does not send an immediate ACK. This is true
        even if the app is not ping-pong and the 1*MSS of data had the PSH
        bit set, suggesting the sending application completed an
        application write.
    
    Thus if the app is not ping-pong, we have this painful case where
    >1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
    write whose last skb is an exact multiple of 1*MSS can get a 40ms
    delayed ACK. This means that any app that transfers data in one
    direction and takes care to align write size or packet size with MSS
    can suffer this problem. With receive zero copy making 4KB MSS values
    more common, it is becoming more common to have application writes
    naturally align with MSS, and more applications are likely to
    encounter this delayed ACK problem.
    
    The fix in this commit is to refine the delayed ACK heuristics with a
    simple check: immediately ACK a received 1*MSS skb with PSH bit set if
    the app reads all data. Why? If an skb has a len of exactly 1*MSS and
    has the PSH bit set then it is likely the end of an application
    write. So more data may not be arriving soon, and yet the data sender
    may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
    set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
    an ACK immediately if the app reads all of the data and is not
    ping-pong. Note that this logic is also executed for the case where
    len > MSS, but in that case this logic does not matter (and does not
    hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
    app reads data and there is more than an MSS of unACKed data.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Neal Cardwell <ncardwell@xxxxxxxxxx>
    Reviewed-by: Yuchung Cheng <ycheng@xxxxxxxxxx>
    Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Cc: Xin Guo <guoxin0309@xxxxxxxxx>
    Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@xxxxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9e1ec69fe5b46..0052a6194cc1a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -172,6 +172,19 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 		if (unlikely(len > icsk->icsk_ack.rcv_mss +
 				   MAX_TCP_OPTION_SPACE))
 			tcp_gro_dev_warn(sk, skb, len);
+		/* If the skb has a len of exactly 1*MSS and has the PSH bit
+		 * set then it is likely the end of an application write. So
+		 * more data may not be arriving soon, and yet the data sender
+		 * may be waiting for an ACK if cwnd-bound or using TX zero
+		 * copy. So we set ICSK_ACK_PUSHED here so that
+		 * tcp_cleanup_rbuf() will send an ACK immediately if the app
+		 * reads all of the data and is not ping-pong. If len > MSS
+		 * then this logic does not matter (and does not hurt) because
+		 * tcp_cleanup_rbuf() will always ACK immediately if the app
+		 * reads data and there is more than an MSS of unACKed data.
+		 */
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_PSH)
+			icsk->icsk_ack.pending |= ICSK_ACK_PUSHED;
 	} else {
 		/* Otherwise, we make more careful check taking into account,
 		 * that SACKs block is variable.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux