Patch "tcp: fix incorrect undo caused by DSACK of TLP retransmit" has been added to the 6.9-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    tcp: fix incorrect undo caused by DSACK of TLP retransmit

to the 6.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tcp-fix-incorrect-undo-caused-by-dsack-of-tlp-retran.patch
and it can be found in the queue-6.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit f15a7324f90226993462fa3abef149e1aa332d6c
Author: Neal Cardwell <ncardwell@xxxxxxxxxx>
Date:   Wed Jul 3 13:12:46 2024 -0400

    tcp: fix incorrect undo caused by DSACK of TLP retransmit
    
    [ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ]
    
    Loss recovery undo_retrans bookkeeping had a long-standing bug where a
    DSACK from a spurious TLP retransmit packet could cause an erroneous
    undo of a fast recovery or RTO recovery that repaired a single
    really-lost packet (in a sequence range outside that of the TLP
    retransmit). Basically, because the loss recovery state machine didn't
    account for the fact that it sent a TLP retransmit, the DSACK for the
    TLP retransmit could erroneously be implicitly be interpreted as
    corresponding to the normal fast recovery or RTO recovery retransmit
    that plugged a real hole, thus resulting in an improper undo.
    
    For example, consider the following buggy scenario where there is a
    real packet loss but the congestion control response is improperly
    undone because of this bug:
    
    + send packets P1, P2, P3, P4
    + P1 is really lost
    + send TLP retransmit of P4
    + receive SACK for original P2, P3, P4
    + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
    + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
    + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
    
    The fix: when we initialize undo machinery in tcp_init_undo(), if
    there is a TLP retransmit in flight, then increment tp->undo_retrans
    so that we make sure that we receive a DSACK corresponding to the TLP
    retransmit, as well as DSACKs for all later normal retransmits, before
    triggering a loss recovery undo. Note that we also have to move the
    line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
    we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
    it only afterward.
    
    Also note that the bug dates back to the original 2013 TLP
    implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").
    
    However, this patch will only compile and work correctly with kernels
    that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
    commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight").
    So we associate this fix with that later commit.
    
    Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight")
    Signed-off-by: Neal Cardwell <ncardwell@xxxxxxxxxx>
    Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Cc: Yuchung Cheng <ycheng@xxxxxxxxxx>
    Cc: Kevin Yang <yyd@xxxxxxxxxx>
    Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@xxxxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7b692bcb61d4a..c765d479869dc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2126,8 +2126,16 @@ void tcp_clear_retrans(struct tcp_sock *tp)
 static inline void tcp_init_undo(struct tcp_sock *tp)
 {
 	tp->undo_marker = tp->snd_una;
+
 	/* Retransmission still in flight may cause DSACKs later. */
-	tp->undo_retrans = tp->retrans_out ? : -1;
+	/* First, account for regular retransmits in flight: */
+	tp->undo_retrans = tp->retrans_out;
+	/* Next, account for TLP retransmits in flight: */
+	if (tp->tlp_high_seq && tp->tlp_retrans)
+		tp->undo_retrans++;
+	/* Finally, avoid 0, because undo_retrans==0 means "can undo now": */
+	if (!tp->undo_retrans)
+		tp->undo_retrans = -1;
 }
 
 static bool tcp_is_rack(const struct sock *sk)
@@ -2206,6 +2214,7 @@ void tcp_enter_loss(struct sock *sk)
 
 	tcp_set_ca_state(sk, TCP_CA_Loss);
 	tp->high_seq = tp->snd_nxt;
+	tp->tlp_high_seq = 0;
 	tcp_ecn_queue_cwr(tp);
 
 	/* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index f96f68cf7961c..22d25f63858b9 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -534,8 +534,6 @@ void tcp_retransmit_timer(struct sock *sk)
 	if (WARN_ON_ONCE(!skb))
 		return;
 
-	tp->tlp_high_seq = 0;
-
 	if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
 	    !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
 		/* Receiver dastardly shrinks window. Our retransmits




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux