Ooooooops I just found that Luiz has already figured out one good patch: https://www.spinics.net/lists/linux-bluetooth/msg92649.html Sorry for the noise and happy weekend. Thanks Lin Ma > -----Original Messages----- > From: LinMa <linma@xxxxxxxxxx> > Sent Time: 2021-07-17 23:41:22 (Saturday) > To: "Tetsuo Handa" <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Cc: "Desmond Cheong Zhi Xi" <desmondcheongzx@xxxxxxxxx>, "Luiz Augusto von Dentz" <luiz.dentz@xxxxxxxxx>, "Johan Hedberg" <johan.hedberg@xxxxxxxxx>, "Marcel Holtmann" <marcel@xxxxxxxxxxxx>, "linux-bluetooth@xxxxxxxxxxxxxxx" <linux-bluetooth@xxxxxxxxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, "Jakub Kicinski" <kuba@xxxxxxxxxx>, "open list:NETWORKING [GENERAL]" <netdev@xxxxxxxxxxxxxxx> > Subject: Yet Another Patch for CVE-2021-3573 > > Hi everyone, > > After reading large lines of code in the net directory of the kernel, I have the following thinkings and may need your guys' suggestions. > > >> >> Saw this and thought I'd offer my two cents. > >> >> BUG: sleeping function called from invalid context > >> >> This is the original problem that Tetsuo's patch was trying to fix. > >> > >> Yes. > >> > >> >> Under the hood of lock_sock, we call lock_sock_nested which might sleep > >> >> because of the mutex_acquire. > >> > >> Both lock_sock() and lock_sock_nested() might sleep. > >> > >> >> But we shouldn't sleep while holding the rw spinlock. > >> > >> Right. In atomic context (e.g. inside interrupt handler, schedulable context > >> with interrupts or preemption disabled, schedulable context inside RCU read > >> critical section, schedulable context inside a spinlock section), we must not > >> call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for > >> a page fault) which are not atomic. > >> > >> >> So we either have to acquire a spinlock instead of a mutex as was done before, > >> > >> Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock. > >> > >> Like LinMa explained, lock_sock() has to be used in order to serialize functions > >> (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and > >> release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev > >> to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting > >> hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG) > >> immediately destroys resources associated with this hdev. > >> > > This is the critical part of the BUG here. As you can read some similar code, for example, code at /net/iucv/af_iucv.c. > > The iucv_sock_bind() function will bind the device: > > static int iucv_sock_bind(struct socket *sock, struct sockaddr *addr, > int addr_len) > { > ... > iucv->hs_dev = dev; > > And this field will be assigned as NULL only when the socket is closed. > > static void iucv_sock_close(struct sock *sk) > { > ... > if (iucv->hs_dev) { > dev_put(iucv->hs_dev); > iucv->hs_dev = NULL; > sk->sk_bound_dev_if = 0; > } > > Even in the afiucv_netdev_event() function, there is non business with iucv->hs_dev. > > So why the hci_sock_dev_event(HCI_DEV_UNREG) need to set the NULL pointer and then decrease the ref-count? > As Tetsuo said, because the hci_unregister_dev() function, which is the caller of hci_sock_dev_event() will > reclaim the resource of the hdev object. It will destroy the workqueue and also clean up the sysfs. > > If we achieve our patches like the iucv stack, or some other ref-count idea (https://lkml.org/lkml/2021/6/22/1347) > without care, the bad thing will happen. Because there is nothing useful in the hdev object, any changes to it make no sense. > > But wait, the write or dereference for this object can be illegal, but there should be some legal actions, like reading flags? > > Hence, we can still delay the release of the hdev object to hci_sock_release (like other net code does). > We just need to take care of the checking part. > > One quick patch is shown below, my POC didn't trigger any warning but more checks are needed. > > diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c > index 251b9128f..db665f78a 100644 > --- a/net/bluetooth/hci_sock.c > +++ b/net/bluetooth/hci_sock.c > @@ -764,12 +764,9 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > sk_for_each(sk, &hci_sk_list.head) { > bh_lock_sock_nested(sk); > if (hci_pi(sk)->hdev == hdev) { > - hci_pi(sk)->hdev = NULL; > sk->sk_err = EPIPE; > sk->sk_state = BT_OPEN; > sk->sk_state_change(sk); > - > - hci_dev_put(hdev); > } > bh_unlock_sock(sk); > } > @@ -880,6 +877,7 @@ static int hci_sock_release(struct socket *sock) > > atomic_dec(&hdev->promisc); > hci_dev_put(hdev); > + hci_pi(sk)->hdev = NULL; > } > > sock_orphan(sk); > @@ -1727,10 +1725,10 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, > break; > case HCI_CHANNEL_MONITOR: > err = -EOPNOTSUPP; > - goto done; > + goto donefast; > case HCI_CHANNEL_LOGGING: > err = hci_logging_frame(sk, msg, len); > - goto done; > + goto donefast; > default: > mutex_lock(&mgmt_chan_list_lock); > chan = __hci_mgmt_chan_find(hci_pi(sk)->channel); > @@ -1740,15 +1738,16 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, > err = -EINVAL; > > mutex_unlock(&mgmt_chan_list_lock); > - goto done; > + goto donefast; > } > > hdev = hci_pi(sk)->hdev; > if (!hdev) { > err = -EBADFD; > - goto done; > + goto donefast; > } > > + hci_dev_lock(hdev); > if (!test_bit(HCI_UP, &hdev->flags)) { > err = -ENETDOWN; > goto done; > @@ -1832,6 +1831,8 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, > err = len; > > done: > + hci_dev_unlock(hdev); > +donefast: > release_sock(sk); > return err; > > > In short, this patch delays the hci_dev_put() to hci_sock_release() and keeps the old bh_lock_sock_nested(). > > Once we did that, the UAF in hci_sock_bound_ioctl() are fixed. ( The four different commands in hci_sock_bound_ioctl will just > traverse a empty linked list ) > > For another UAF point: hci_sock_sendmsg(), this patch uses hci_dev_lock() to make sure the flags and resource in hdev will not be > released till the sendmsg is finished. (Dislike the hci_sock_create(), the hci_sock_sendmsg() can sleep so the mutex lock is possible) > > Of course, more auditing is needed but I just want to share this to you. Any suggestions and discussions will be much appreciated. > > > >> >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes. > >> > >> Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux > >> distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573. > >> This patch should be sent to linux.git and stables as soon as possible. But due to little > >> attention on this patch, I'm already testing this patch in linux-next.git via my tree. > >> I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I > >> directly send to Linus?) > >> > >> >> > >> > > >> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg, > >> > not hci_sock_dev_event. > >> > >> I didn't catch this part. Are you talking about a different poc? > >> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR). > >> > >> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock()) > >> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker > >> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without > >> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window > >> results in UAF (doesn't it, LinMa?). > >> > >> > In this case, it's not clear to me why the atomic context is being violated. > >> > >> In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between > >> read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call > >> lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). > >> > >> > > >> > Sorry for the noise. > >> > > >> >>> > >> >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in > >> >>> > >> >>> --- a/net/bluetooth/hci_sock.c > >> >>> +++ b/net/bluetooth/hci_sock.c > >> >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > >> >>> /* Detach sockets from device */ > >> >>> read_lock(&hci_sk_list.lock); > >> >>> sk_for_each(sk, &hci_sk_list.head) { > >> >>> + local_bh_disable(); > >> >>> bh_lock_sock_nested(sk); > >> >>> if (hci_pi(sk)->hdev == hdev) { > >> >>> hci_pi(sk)->hdev = NULL; > >> >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > >> >>> hci_dev_put(hdev); > >> >>> } > >> >>> bh_unlock_sock(sk); > >> >>> + local_bh_enable(); > >> >>> } > >> >>> read_unlock(&hci_sk_list.lock); > >> >>> } > >> >>> > >> >>> But this is not useful, the UAF still occurs > >> >>> > >> >> > >> >> I might be very mistaken on this, but I believe the UAF still happens because > >> >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things. > >> > >> Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html > >> > >> >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between > >> >> user contexts and bottom halves, > >> > >> serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts > >> > >> >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between > >> >> multiple users. > >> > >> serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts > >> > >> >> > >> >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested > >> >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as > >> >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that > >> >> sleeping functions aren't called between the bh_lock/unlock. > >> > >> We can't do it for hci_sock_dev_event(HCI_DEV_UNREG). > > Regards > > Lin Ma