Patch "bpf, sockmap: Fix skb refcnt race after locking changes" has been added to the 6.1-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Wed, 13 Sep 2023 21:25:13 -0400

This is a note to let you know that I've just added the patch titled

    bpf, sockmap: Fix skb refcnt race after locking changes

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     bpf-sockmap-fix-skb-refcnt-race-after-locking-change.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 92ad45f46c91a35659178bbab45871873a544b2f
Author: John Fastabend <john.fastabend@xxxxxxxxx>
Date:   Fri Sep 1 13:21:37 2023 -0700

    bpf, sockmap: Fix skb refcnt race after locking changes
    
    [ Upstream commit a454d84ee20baf7bd7be90721b9821f73c7d23d9 ]
    
    There is a race where skb's from the sk_psock_backlog can be referenced
    after userspace side has already skb_consumed() the sk_buff and its refcnt
    dropped to zer0 causing use after free.
    
    The flow is the following:
    
      while ((skb = skb_peek(&psock->ingress_skb))
        sk_psock_handle_Skb(psock, skb, ..., ingress)
        if (!ingress) ...
        sk_psock_skb_ingress
           sk_psock_skb_ingress_enqueue(skb)
              msg->skb = skb
              sk_psock_queue_msg(psock, msg)
        skb_dequeue(&psock->ingress_skb)
    
    The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
    what the application reads when recvmsg() is called. An application can
    read this anytime after the msg is placed on the queue. The recvmsg hook
    will also read msg->skb and then after user space reads the msg will call
    consume_skb(skb) on it effectively free'ing it.
    
    But, the race is in above where backlog queue still has a reference to
    the skb and calls skb_dequeue(). If the skb_dequeue happens after the
    user reads and free's the skb we have a use after free.
    
    The !ingress case does not suffer from this problem because it uses
    sendmsg_*(sk, msg) which does not pass the sk_buff further down the
    stack.
    
    The following splat was observed with 'test_progs -t sockmap_listen':
    
      [ 1022.710250][ T2556] general protection fault, ...
      [...]
      [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
      [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
      [ 1022.713653][ T2556] Code: ...
      [...]
      [ 1022.720699][ T2556] Call Trace:
      [ 1022.720984][ T2556]  <TASK>
      [ 1022.721254][ T2556]  ? die_addr+0x32/0x80^M
      [ 1022.721589][ T2556]  ? exc_general_protection+0x25a/0x4b0
      [ 1022.722026][ T2556]  ? asm_exc_general_protection+0x22/0x30
      [ 1022.722489][ T2556]  ? skb_dequeue+0x4c/0x80
      [ 1022.722854][ T2556]  sk_psock_backlog+0x27a/0x300
      [ 1022.723243][ T2556]  process_one_work+0x2a7/0x5b0
      [ 1022.723633][ T2556]  worker_thread+0x4f/0x3a0
      [ 1022.723998][ T2556]  ? __pfx_worker_thread+0x10/0x10
      [ 1022.724386][ T2556]  kthread+0xfd/0x130
      [ 1022.724709][ T2556]  ? __pfx_kthread+0x10/0x10
      [ 1022.725066][ T2556]  ret_from_fork+0x2d/0x50
      [ 1022.725409][ T2556]  ? __pfx_kthread+0x10/0x10
      [ 1022.725799][ T2556]  ret_from_fork_asm+0x1b/0x30
      [ 1022.726201][ T2556]  </TASK>
    
    To fix we add an skb_get() before passing the skb to be enqueued in the
    engress queue. This bumps the skb->users refcnt so that consume_skb()
    and kfree_skb will not immediately free the sk_buff. With this we can
    be sure the skb is still around when we do the dequeue. Then we just
    need to decrement the refcnt or free the skb in the backlog case which
    we do by calling kfree_skb() on the ingress case as well as the sendmsg
    case.
    
    Before locking change from fixes tag we had the sock locked so we
    couldn't race with user and there was no issue here.
    
    Fixes: 799aa7f98d53e ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
    Reported-by: Jiri Olsa  <jolsa@xxxxxxxxxx>
    Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
    Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
    Tested-by: Xu Kuohai <xukuohai@xxxxxxxxxx>
    Tested-by: Jiri Olsa <jolsa@xxxxxxxxxx>
    Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@xxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 296e45b6c3c0d..a5c1f67dc96ec 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -611,12 +611,18 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb
 static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb,
 			       u32 off, u32 len, bool ingress)
 {
+	int err = 0;
+
 	if (!ingress) {
 		if (!sock_writeable(psock->sk))
 			return -EAGAIN;
 		return skb_send_sock(psock->sk, skb, off, len);
 	}
-	return sk_psock_skb_ingress(psock, skb, off, len);
+	skb_get(skb);
+	err = sk_psock_skb_ingress(psock, skb, off, len);
+	if (err < 0)
+		kfree_skb(skb);
+	return err;
 }
 
 static void sk_psock_skb_state(struct sk_psock *psock,
@@ -684,9 +690,7 @@ static void sk_psock_backlog(struct work_struct *work)
 		} while (len);
 
 		skb = skb_dequeue(&psock->ingress_skb);
-		if (!ingress) {
-			kfree_skb(skb);
-		}
+		kfree_skb(skb);
 	}
 end:
 	mutex_unlock(&psock->work_mutex);