Hi Tetsuo, Just find out another interesting function: sock_owned_by_user(). (I am just a noob of kernel locks) Hence I think the following patch has the same 'effect' as the old patch e305509e678b3 ("Bluetooth: use correct lock to prevent UAF of hdev object") diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..0cc4b88daa96 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -762,7 +762,11 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - lock_sock(sk); + bh_lock_sock_nested(sk); +busywait: + if (sock_owned_by_user(sk)) + goto busywait; + if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; @@ -771,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) hci_dev_put(hdev); } - release_sock(sk); + bh_unlock_sock(sk); } read_unlock(&hci_sk_list.lock); } The sad thing is that it seems will cost CPU resource to do meaningless wait... What do you think? Can this sock_owned_by_user() function do any help? Thanks Lin Ma > From: "Tetsuo Handa" <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Sent Time: 2021-07-16 22:48:13 (Friday) > To: "Desmond Cheong Zhi Xi" <desmondcheongzx@xxxxxxxxx>, LinMa <linma@xxxxxxxxxx>, "Luiz Augusto von Dentz" <luiz.dentz@xxxxxxxxx>, "Johan Hedberg" <johan.hedberg@xxxxxxxxx>, "Marcel Holtmann" <marcel@xxxxxxxxxxxx> > Cc: "linux-bluetooth@xxxxxxxxxxxxxxx" <linux-bluetooth@xxxxxxxxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxxxxx>, "Jakub Kicinski" <kuba@xxxxxxxxxx>, "open list:NETWORKING [GENERAL]" <netdev@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH v3] Bluetooth: call lock_sock() outside of spinlock section > > On 2021/07/16 13:11, Desmond Cheong Zhi Xi wrote: > > On 16/7/21 11:47 am, Desmond Cheong Zhi Xi wrote: > >> Saw this and thought I'd offer my two cents. > >> BUG: sleeping function called from invalid context > >> This is the original problem that Tetsuo's patch was trying to fix. > > Yes. > > >> Under the hood of lock_sock, we call lock_sock_nested which might sleep > >> because of the mutex_acquire. > > Both lock_sock() and lock_sock_nested() might sleep. > > >> But we shouldn't sleep while holding the rw spinlock. > > Right. In atomic context (e.g. inside interrupt handler, schedulable context > with interrupts or preemption disabled, schedulable context inside RCU read > critical section, schedulable context inside a spinlock section), we must not > call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for > a page fault) which are not atomic. > > >> So we either have to acquire a spinlock instead of a mutex as was done before, > > Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock. > > Like LinMa explained, lock_sock() has to be used in order to serialize functions > (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and > release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev > to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting > hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG) > immediately destroys resources associated with this hdev. > > >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes. > > Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux > distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573. > This patch should be sent to linux.git and stables as soon as possible. But due to little > attention on this patch, I'm already testing this patch in linux-next.git via my tree. > I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I > directly send to Linus?) > > >> > > > > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg, > > not hci_sock_dev_event. > > I didn't catch this part. Are you talking about a different poc? > As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR). > > hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock()) > calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker > to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without > waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window > results in UAF (doesn't it, LinMa?). > > > In this case, it's not clear to me why the atomic context is being violated. > > In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between > read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call > lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). > > > > > Sorry for the noise. > > > >>> > >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in > >>> > >>> --- a/net/bluetooth/hci_sock.c > >>> +++ b/net/bluetooth/hci_sock.c > >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > >>> /* Detach sockets from device */ > >>> read_lock(&hci_sk_list.lock); > >>> sk_for_each(sk, &hci_sk_list.head) { > >>> + local_bh_disable(); > >>> bh_lock_sock_nested(sk); > >>> if (hci_pi(sk)->hdev == hdev) { > >>> hci_pi(sk)->hdev = NULL; > >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > >>> hci_dev_put(hdev); > >>> } > >>> bh_unlock_sock(sk); > >>> + local_bh_enable(); > >>> } > >>> read_unlock(&hci_sk_list.lock); > >>> } > >>> > >>> But this is not useful, the UAF still occurs > >>> > >> > >> I might be very mistaken on this, but I believe the UAF still happens because > >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things. > > Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html > > >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between > >> user contexts and bottom halves, > > serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts > > >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between > >> multiple users. > > serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts > > >> > >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested > >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as > >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that > >> sleeping functions aren't called between the bh_lock/unlock. > > We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).