Re: [syzbot] KASAN: use-after-free Write in sco_sock_timeout

Desmond Cheong Zhi Xi <desmondcheongzx@xxxxxxxxx> · Mon, 30 Aug 2021 02:34:11 +0800

On 29/8/21 10:53 pm, Desmond Cheong Zhi Xi wrote:
On 29/8/21 4:29 pm, Hillf Danton wrote:
On Fri, 27 Aug 2021 15:58:34 +0800 Desmond Cheong Zhi Xi wrote:
On 27/8/21 9:19 am, Hillf Danton wrote:
On Thu, 26 Aug 2021 09:29:24 -0700
syzbot found the following issue on:

HEAD commit:    e3f30ab28ac8 Merge branch 'pktgen-samples-next'
git tree:       net-next
console output: 
https://syzkaller.appspot.com/x/log.txt?x=13249c96300000
kernel config:  
https://syzkaller.appspot.com/x/.config?x=ef482942966bf763
dashboard link: 
https://syzkaller.appspot.com/bug?extid=2bef95d3ab4daa10155b
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU 
Binutils for Debian) 2.35.1
syz repro:      
https://syzkaller.appspot.com/x/repro.syz?x=16a29ea9300000

The issue was bisected to:

commit e1dee2c1de2b4dd00eb44004a4bda6326ed07b59
Author: Desmond Cheong Zhi Xi <desmondcheongzx@xxxxxxxxx>
Date:   Tue Aug 10 04:14:10 2021 +0000

      Bluetooth: fix repeated calls to sco_sock_kill

To fix the uaf, grab another hold to sock to make the timeout work safe.

#syz test: 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git 
e3f30ab28ac8

--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -190,15 +190,14 @@ static void sco_conn_del(struct hci_conn
      sco_conn_unlock(conn);
      if (sk) {
-        sock_hold(sk);
          lock_sock(sk);
          sco_sock_clear_timer(sk);
          sco_chan_del(sk, err);
          release_sock(sk);
-        sock_put(sk);
          /* Ensure no more work items will run before freeing conn. */
          cancel_delayed_work_sync(&conn->timeout_work);
+        sock_put(sk);

Hi Hillf,

Saw that this passed the reproducer. But on closer inspection, I think 
what's happening is that sco_conn_del is never run.

So the extra sock_hold prevents a UAF, but that's because now the 
reference count never goes to 0. In my opinion, something closer to your 
previous proposal (+ also addressing other calls to __sco_sock_close) 
where we call cancel_delayed_work_sync after the channel is deleted 
would address the root cause better.

Just my two cents.


Ok I went back to make a more thorough audit. Even without calling
cancel_delayed_work_sync, sco_sock_timeout should not cause a UAF.

I believe the real issue is that we can allocate a connection twice in
sco_connect. This means that the first connection gets lost and we're
unable to clean it up properly.

Thoughts on this?

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git e3f30ab28ac8

--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -578,9 +578,6 @@ static int sco_sock_connect(struct socket *sock, struct sockaddr *addr, int alen
 	    addr->sa_family != AF_BLUETOOTH)
 		return -EINVAL;
 
-	if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND)
-		return -EBADFD;
-
 	if (sk->sk_type != SOCK_SEQPACKET)
 		return -EINVAL;
 
@@ -591,6 +588,13 @@ static int sco_sock_connect(struct socket *sock, struct sockaddr *addr, int alen
 
 	lock_sock(sk);
 
+	if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
+		hci_dev_unlock(hdev);
+		hci_dev_put(hdev);
+		err = -EBADFD;
+		goto done;
+	}
+
 	/* Set destination address and psm */
 	bacpy(&sco_pi(sk)->dst, &sa->sco_bdaddr);