Hi there, I'm just exhilarated to see there have been some new ideas to fix this. > > How about we revert back to use bh_lock_sock_nested but use > local_bh_disable like the following patch: > > https://patchwork.kernel.org/project/bluetooth/patch/20210713162838.693266-1-desmondcheongzx@xxxxxxxxx/ > I have checked that patch and learn about some `local_bh_disable/enable` usage. To the best of my knowledge, the local_bh_disable() function can be used to disable the processing of bottom halves (softirqs). Or in another word, if process context function, hci_sock_sendmsg() for example, can mask the BH (hci_dev_do_close()?). It doesn't need to worry about the UAF. However, after doing some experiments, I failed :( For instance, I try to do following patch: --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -1720,6 +1720,7 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, return -EINVAL; lock_sock(sk); + local_bh_disable(); switch (hci_pi(sk)->channel) { case HCI_CHANNEL_RAW: @@ -1832,7 +1833,9 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, err = len; done: + local_bh_enable(); release_sock(sk); + return err; But the POC code shows error message like below: [ 18.169155] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:197 [ 18.170181] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: exp [ 18.170987] 1 lock held by exp/120: [ 18.171384] #0: ffff888011dd5120 (sk_lock-AF_BLUETOOTH-BTPROTO_HCI){+.+.}-{0:0}, at: hci_sock_sendmsg+0x11e/0x26c0 [ 18.172300] CPU: 0 PID: 120 Comm: exp Not tainted 5.11.11+ #44 [ 18.172921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 ... The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { + local_bh_disable(); bh_lock_sock_nested(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) hci_dev_put(hdev); } bh_unlock_sock(sk); + local_bh_enable(); } read_unlock(&hci_sk_list.lock); } But this is not useful, the UAF still occurs [ 13.862117] ================================================================== [ 13.863064] BUG: KASAN: use-after-free in __lock_acquire+0xe5/0x2ca0 [ 13.863852] Read of size 8 at addr ffff888011d9aeb0 by task exp/119 [ 13.864620] [ 13.864818] CPU: 0 PID: 119 Comm: exp Not tainted 5.11.11+ #45 [ 13.865543] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 13.866634] Call Trace: [ 13.866947] dump_stack+0x183/0x22e [ 13.867389] ? show_regs_print_info+0x12/0x12 [ 13.867927] ? log_buf_vmcoreinfo_setup+0x45d/0x45d [ 13.868503] ? _raw_spin_lock_irqsave+0xbd/0x100 [ 13.869244] print_address_description+0x7b/0x3a0 [ 13.869828] __kasan_report+0x14e/0x200 [ 13.870288] ? __lock_acquire+0xe5/0x2ca0 [ 13.870768] kasan_report+0x47/0x60 [ 13.871189] __lock_acquire+0xe5/0x2ca0 [ 13.871647] ? lock_acquire+0x168/0x6a0 [ 13.872107] ? trace_lock_release+0x5c/0x120 [ 13.872615] ? do_user_addr_fault+0x9c2/0xdb0 [ 13.873135] ? trace_lock_acquire+0x150/0x150 [ 13.873661] ? rcu_read_lock_sched_held+0x87/0x110 [ 13.874232] ? perf_trace_rcu_barrier+0x360/0x360 [ 13.874790] ? avc_has_perm_noaudit+0x442/0x4c0 [ 13.875332] lock_acquire+0x168/0x6a0 [ 13.875772] ? skb_queue_tail+0x32/0x120 [ 13.876240] ? do_kern_addr_fault+0x230/0x230 [ 13.876756] ? read_lock_is_recursive+0x10/0x10 [ 13.877300] ? exc_page_fault+0xf3/0x1b0 [ 13.877770] ? cred_has_capability+0x191/0x3f0 [ 13.878290] ? cred_has_capability+0x2a1/0x3f0 [ 13.878816] ? rcu_lock_release+0x20/0x20 [ 13.879295] _raw_spin_lock_irqsave+0xb1/0x100 [ 13.879821] ? skb_queue_tail+0x32/0x120 [ 13.880287] ? _raw_spin_lock+0x40/0x40 [ 13.880745] skb_queue_tail+0x32/0x120 [ 13.881194] hci_sock_sendmsg+0x1545/0x26b0 >From my point of view, adding the local_bh_disable() cannot prevent current hci_sock_dev_event() to set and decrease the ref-count. It's not quite similar with the cases that Desmond discussed. (Or maybe just I don't know how to use this). I recently tried to find some similar cases (and I did, reported to security already but get no reply) and figure out how others are fixed. Some guideline tells me that (http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html) "If process context code and a bottom half share data, you need to disable bottom-half processing and obtain a lock before accessing the data. Doing both ensures local and SMP protection and prevents a deadlock." Assuming hci_sock_sendmsg()/hci_sock_bound_ioctl() are the process contexts while the hci_sock_dev_event(), not sure, is the BH context. The fact is that the hci_sock_dev_event() should wait for the process contexts. Hence, I think Tetsuo is on the right way. Regards Lock-Noob LinMa