Patch "tcp: avoid premature drops in tcp_add_backlog()" has been added to the 6.9-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 26 May 2024 13:10:33 -0400

This is a note to let you know that I've just added the patch titled

    tcp: avoid premature drops in tcp_add_backlog()

to the 6.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tcp-avoid-premature-drops-in-tcp_add_backlog.patch
and it can be found in the queue-6.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 22e104eac37ec413518f64adcb2b1897122762e8
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date:   Tue Apr 23 12:56:20 2024 +0000

    tcp: avoid premature drops in tcp_add_backlog()
    
    [ Upstream commit ec00ed472bdb7d0af840da68c8c11bff9f4d9caa ]
    
    While testing TCP performance with latest trees,
    I saw suspect SOCKET_BACKLOG drops.
    
    tcp_add_backlog() computes its limit with :
    
        limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
                (u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
        limit += 64 * 1024;
    
    This does not take into account that sk->sk_backlog.len
    is reset only at the very end of __release_sock().
    
    Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
    sk_rcvbuf in normal conditions.
    
    We should double sk->sk_rcvbuf contribution in the formula
    to absorb bubbles in the backlog, which happen more often
    for very fast flows.
    
    This change maintains decent protection against abuses.
    
    Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account")
    Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20240423125620.3309458-1-edumazet@xxxxxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e0cef75f85fb9..92511b7fd5249 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2001,7 +2001,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
 		     enum skb_drop_reason *reason)
 {
-	u32 limit, tail_gso_size, tail_gso_segs;
+	u32 tail_gso_size, tail_gso_segs;
 	struct skb_shared_info *shinfo;
 	const struct tcphdr *th;
 	struct tcphdr *thtail;
@@ -2010,6 +2010,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
 	bool fragstolen;
 	u32 gso_segs;
 	u32 gso_size;
+	u64 limit;
 	int delta;
 
 	/* In case all data was pulled from skb frags (in __pskb_pull_tail()),
@@ -2107,7 +2108,13 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
 	__skb_push(skb, hdrlen);
 
 no_coalesce:
-	limit = (u32)READ_ONCE(sk->sk_rcvbuf) + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
+	/* sk->sk_backlog.len is reset only at the end of __release_sock().
+	 * Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
+	 * sk_rcvbuf in normal conditions.
+	 */
+	limit = ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1;
+
+	limit += ((u32)READ_ONCE(sk->sk_sndbuf)) >> 1;
 
 	/* Only socket owner can try to collapse/prune rx queues
 	 * to reduce memory overhead, so add a little headroom here.
@@ -2115,6 +2122,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb,
 	 */
 	limit += 64 * 1024;
 
+	limit = min_t(u64, limit, UINT_MAX);
+
 	if (unlikely(sk_add_backlog(sk, skb, limit))) {
 		bh_unlock_sock(sk);
 		*reason = SKB_DROP_REASON_SOCKET_BACKLOG;