Patch "bpf: tcp_read_skb needs to pop skb regardless of seq" has been added to the 6.5-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 7 Oct 2023 08:18:45 -0400

This is a note to let you know that I've just added the patch titled

    bpf: tcp_read_skb needs to pop skb regardless of seq

to the 6.5-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     bpf-tcp_read_skb-needs-to-pop-skb-regardless-of-seq.patch
and it can be found in the queue-6.5 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit cda142636621f4423ca14781c98976b4c1bd508c
Author: John Fastabend <john.fastabend@xxxxxxxxx>
Date:   Mon Sep 25 20:52:58 2023 -0700

    bpf: tcp_read_skb needs to pop skb regardless of seq
    
    [ Upstream commit 9b7177b1df64b8d7f85700027c324aadd6aded00 ]
    
    Before fix e5c6de5fa0258 tcp_read_skb() would increment the tp->copied-seq
    value. This (as described in the commit) would cause an error for apps
    because once that is incremented the application might believe there is no
    data to be read. Then some apps would stall or abort believing no data is
    available.
    
    However, the fix is incomplete because it introduces another issue in
    the skb dequeue. The loop does tcp_recv_skb() in a while loop to consume
    as many skbs as possible. The problem is the call is ...
    
      tcp_recv_skb(sk, seq, &offset)
    
    ... where 'seq' is:
    
      u32 seq = tp->copied_seq;
    
    Now we can hit a case where we've yet incremented copied_seq from BPF side,
    but then tcp_recv_skb() fails this test ...
    
     if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
    
    ... so that instead of returning the skb we call tcp_eat_recv_skb() which
    frees the skb. This is because the routine believes the SKB has been collapsed
    per comment:
    
     /* This looks weird, but this can happen if TCP collapsing
      * splitted a fat GRO packet, while we released socket lock
      * in skb_splice_bits()
      */
    
    This can't happen here we've unlinked the full SKB and orphaned it. Anyways
    it would confuse any BPF programs if the data were suddenly moved underneath
    it.
    
    To fix this situation do simpler operation and just skb_peek() the data
    of the queue followed by the unlink. It shouldn't need to check this
    condition and tcp_read_skb() reads entire skbs so there is no need to
    handle the 'offset!=0' case as we would see in tcp_read_sock().
    
    Fixes: e5c6de5fa0258 ("bpf, sockmap: Incorrectly handling copied_seq")
    Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
    Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
    Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
    Reviewed-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>
    Link: https://lore.kernel.org/bpf/20230926035300.135096-2-john.fastabend@xxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 75f24b931a185..9cfc07d1e4252 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1618,16 +1618,13 @@ EXPORT_SYMBOL(tcp_read_sock);
 
 int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
-	struct tcp_sock *tp = tcp_sk(sk);
-	u32 seq = tp->copied_seq;
 	struct sk_buff *skb;
 	int copied = 0;
-	u32 offset;
 
 	if (sk->sk_state == TCP_LISTEN)
 		return -ENOTCONN;
 
-	while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) {
+	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
 		u8 tcp_flags;
 		int used;
 
@@ -1640,13 +1637,10 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 				copied = used;
 			break;
 		}
-		seq += used;
 		copied += used;
 
-		if (tcp_flags & TCPHDR_FIN) {
-			++seq;
+		if (tcp_flags & TCPHDR_FIN)
 			break;
-		}
 	}
 	return copied;
 }