Patch "bpf, sockmap: Pass skb ownership through read_skb" has been added to the 6.1-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    bpf, sockmap: Pass skb ownership through read_skb

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     bpf-sockmap-pass-skb-ownership-through-read_skb.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 4df24f4760fc039a3b25c34569712d7615d3b5a4
Author: John Fastabend <john.fastabend@xxxxxxxxx>
Date:   Mon May 22 19:56:05 2023 -0700

    bpf, sockmap: Pass skb ownership through read_skb
    
    [ Upstream commit 78fa0d61d97a728d306b0c23d353c0e340756437 ]
    
    The read_skb hook calls consume_skb() now, but this means that if the
    recv_actor program wants to use the skb it needs to inc the ref cnt
    so that the consume_skb() doesn't kfree the sk_buff.
    
    This is problematic because in some error cases under memory pressure
    we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
    Then we get this,
    
     skb_linearize()
       __pskb_pull_tail()
         pskb_expand_head()
           BUG_ON(skb_shared(skb))
    
    Because we incremented users refcnt from sk_psock_verdict_recv() we
    hit the bug on with refcnt > 1 and trip it.
    
    To fix lets simply pass ownership of the sk_buff through the skb_read
    call. Then we can drop the consume from read_skb handlers and assume
    the verdict recv does any required kfree.
    
    Bug found while testing in our CI which runs in VMs that hit memory
    constraints rather regularly. William tested TCP read_skb handlers.
    
    [  106.536188] ------------[ cut here ]------------
    [  106.536197] kernel BUG at net/core/skbuff.c:1693!
    [  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
    [  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
    [  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
    [  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
    [  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
    [  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
    [  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
    [  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
    [  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
    [  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
    [  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
    [  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
    [  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  106.542255] Call Trace:
    [  106.542383]  <IRQ>
    [  106.542487]  __pskb_pull_tail+0x4b/0x3e0
    [  106.542681]  skb_ensure_writable+0x85/0xa0
    [  106.542882]  sk_skb_pull_data+0x18/0x20
    [  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
    [  106.543536]  ? migrate_disable+0x66/0x80
    [  106.543871]  sk_psock_verdict_recv+0xe2/0x310
    [  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
    [  106.544561]  tcp_read_skb+0x7b/0x120
    [  106.544740]  tcp_data_queue+0x904/0xee0
    [  106.544931]  tcp_rcv_established+0x212/0x7c0
    [  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
    [  106.545326]  tcp_v4_rcv+0xe70/0xf60
    [  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
    [  106.545744]  ip_local_deliver_finish+0xa7/0x150
    
    Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
    Reported-by: William Findlay <will@xxxxxxxxxxxxx>
    Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
    Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
    Tested-by: William Findlay <will@xxxxxxxxxxxxx>
    Reviewed-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>
    Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@xxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 53d0251788aa2..2b6d9519ff29c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1180,8 +1180,6 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 	int ret = __SK_DROP;
 	int len = skb->len;
 
-	skb_get(skb);
-
 	rcu_read_lock();
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1fb67f819de49..31156ebb759c0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1772,7 +1772,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 		WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
 		tcp_flags = TCP_SKB_CB(skb)->tcp_flags;
 		used = recv_actor(sk, skb);
-		consume_skb(skb);
 		if (used < 0) {
 			if (!copied)
 				copied = used;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3ffa30c37293e..956d6797c76f3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1806,7 +1806,7 @@ EXPORT_SYMBOL(__skb_recv_udp);
 int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
 	struct sk_buff *skb;
-	int err, copied;
+	int err;
 
 try_again:
 	skb = skb_recv_udp(sk, MSG_DONTWAIT, &err);
@@ -1825,10 +1825,7 @@ int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 	}
 
 	WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk));
-	copied = recv_actor(sk, skb);
-	kfree_skb(skb);
-
-	return copied;
+	return recv_actor(sk, skb);
 }
 EXPORT_SYMBOL(udp_read_skb);
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 70eb3bc67126d..5b19b6c53a2cb 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2552,7 +2552,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 {
 	struct unix_sock *u = unix_sk(sk);
 	struct sk_buff *skb;
-	int err, copied;
+	int err;
 
 	mutex_lock(&u->iolock);
 	skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
@@ -2560,10 +2560,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 	if (!skb)
 		return err;
 
-	copied = recv_actor(sk, skb);
-	kfree_skb(skb);
-
-	return copied;
+	return recv_actor(sk, skb);
 }
 
 /*



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux