Patch "tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO" has been added to the 6.1-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 29 Jun 2024 07:55:23 -0400

This is a note to let you know that I've just added the patch titled

    tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tcp-fix-tcp_rcv_fastopen_synack-to-enter-tcp_ca_loss.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit b2df673c52d7f708afbec030fbef2985bd2a8d84
Author: Neal Cardwell <ncardwell@xxxxxxxxxx>
Date:   Mon Jun 24 14:43:23 2024 +0000

    tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
    
    [ Upstream commit 5dfe9d273932c647bdc9d664f939af9a5a398cbc ]
    
    Testing determined that the recent commit 9e046bb111f1 ("tcp: clear
    tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does
    not always ensure retrans_stamp is 0 after a TFO payload retransmit.
    
    If transmit completion for the SYN+data skb happens after the client
    TCP stack receives the SYNACK (which sometimes happens), then
    retrans_stamp can erroneously remain non-zero for the lifetime of the
    connection, causing a premature ETIMEDOUT later.
    
    Testing and tracing showed that the buggy scenario is the following
    somewhat tricky sequence:
    
    + Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO
      cookie + data in a single packet in the syn_data skb. It hands the
      syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially,
      it then reuses the same original (non-clone) syn_data skb,
      transforming it by advancing the seq by one byte and removing the
      FIN bit, and enques the resulting payload-only skb in the
      sk->tcp_rtx_queue.
    
    + Client sets retrans_stamp to the start time of the three-way
      handshake.
    
    + Cookie mismatches or server has TFO disabled, and server only ACKs
      SYN.
    
    + tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears
      retrans_stamp.
    
    + Since the client SYN was acked but not the payload, the TFO failure
      code path in tcp_rcv_fastopen_synack() tries to retransmit the
      payload skb.  However, in some cases the transmit completion for the
      clone of the syn_data (which had SYN + TFO cookie + data) hasn't
      happened.  In those cases, skb_still_in_host_queue() returns true
      for the retransmitted TFO payload, because the clone of the syn_data
      skb has not had its tx completetion.
    
    + Because skb_still_in_host_queue() finds skb_fclone_busy() is true,
      it sets the TSQ_THROTTLED bit and the retransmit does not happen in
      the tcp_rcv_fastopen_synack() call chain.
    
    + The tcp_rcv_fastopen_synack() code next implicitly assumes the
      retransmit process is finished, and sets retrans_stamp to 0 to clear
      it, but this is later overwritten (see below).
    
    + Later, upon tx completion, tcp_tsq_write() calls
      tcp_xmit_retransmit_queue(), which puts the retransmit in flight and
      sets retrans_stamp to a non-zero value.
    
    + The client receives an ACK for the retransmitted TFO payload data.
    
    + Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to
      make tcp_ack_is_dubious() true and make us call
      tcp_fastretrans_alert() and reach a code path that clears
      retrans_stamp, retrans_stamp stays nonzero.
    
    + Later, if there is a TLP, RTO, RTO sequence, then the connection
      will suffer an early ETIMEDOUT due to the erroneously ancient
      retrans_stamp.
    
    The fix: this commit refactors the code to have
    tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of
    tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and
    call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and
    tcp_rcv_fastopen_synack() share code in this way because in both cases
    we get a packet indicating non-congestion loss (MTU reduction or TFO
    failure) and thus in both cases we want to retransmit as many packets
    as cwnd allows, without reducing cwnd. And given that retransmits will
    set retrans_stamp to a non-zero value (and may do so in a later
    calling context due to TSQ), we also want to enter CA_Loss so that we
    track when all retransmitted packets are ACked and clear retrans_stamp
    when that happens (to ensure later recurring RTOs are using the
    correct retrans_stamp and don't declare ETIMEDOUT prematurely).
    
    Fixes: 9e046bb111f1 ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()")
    Fixes: a7abf3cd76e1 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
    Signed-off-by: Neal Cardwell <ncardwell@xxxxxxxxxx>
    Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Cc: Yuchung Cheng <ycheng@xxxxxxxxxx>
    Link: https://patch.msgid.link/20240624144323.2371403-1-ncardwell.sw@xxxxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 852745a90aa8d..2146299016eda 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2754,13 +2754,37 @@ static void tcp_mtup_probe_success(struct sock *sk)
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS);
 }
 
+/* Sometimes we deduce that packets have been dropped due to reasons other than
+ * congestion, like path MTU reductions or failed client TFO attempts. In these
+ * cases we call this function to retransmit as many packets as cwnd allows,
+ * without reducing cwnd. Given that retransmits will set retrans_stamp to a
+ * non-zero value (and may do so in a later calling context due to TSQ), we
+ * also enter CA_Loss so that we track when all retransmitted packets are ACKed
+ * and clear retrans_stamp when that happens (to ensure later recurring RTOs
+ * are using the correct retrans_stamp and don't declare ETIMEDOUT
+ * prematurely).
+ */
+static void tcp_non_congestion_loss_retransmit(struct sock *sk)
+{
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (icsk->icsk_ca_state != TCP_CA_Loss) {
+		tp->high_seq = tp->snd_nxt;
+		tp->snd_ssthresh = tcp_current_ssthresh(sk);
+		tp->prior_ssthresh = 0;
+		tp->undo_marker = 0;
+		tcp_set_ca_state(sk, TCP_CA_Loss);
+	}
+	tcp_xmit_retransmit_queue(sk);
+}
+
 /* Do a simple retransmit without using the backoff mechanisms in
  * tcp_timer. This is used for path mtu discovery.
  * The socket is already locked here.
  */
 void tcp_simple_retransmit(struct sock *sk)
 {
-	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	int mss;
@@ -2800,14 +2824,7 @@ void tcp_simple_retransmit(struct sock *sk)
 	 * in network, but units changed and effective
 	 * cwnd/ssthresh really reduced now.
 	 */
-	if (icsk->icsk_ca_state != TCP_CA_Loss) {
-		tp->high_seq = tp->snd_nxt;
-		tp->snd_ssthresh = tcp_current_ssthresh(sk);
-		tp->prior_ssthresh = 0;
-		tp->undo_marker = 0;
-		tcp_set_ca_state(sk, TCP_CA_Loss);
-	}
-	tcp_xmit_retransmit_queue(sk);
+	tcp_non_congestion_loss_retransmit(sk);
 }
 EXPORT_SYMBOL(tcp_simple_retransmit);
 
@@ -6145,8 +6162,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
 			tp->fastopen_client_fail = TFO_DATA_NOT_ACKED;
 		skb_rbtree_walk_from(data)
 			 tcp_mark_skb_lost(sk, data);
-		tcp_xmit_retransmit_queue(sk);
-		tp->retrans_stamp = 0;
+		tcp_non_congestion_loss_retransmit(sk);
 		NET_INC_STATS(sock_net(sk),
 				LINUX_MIB_TCPFASTOPENACTIVEFAIL);
 		return true;