Hello Linux-Bluetooth, Sometimes after boot following kernel warning is observed: [ 62.793493] refcount_t: underflow; use-after-free. [ 62.799419] WARNING: CPU: 2 PID: 69 at /mnt/host/source/src/third_party/ kernel/v4.19/lib/refcount.c:187 refcount_sub_and_test_checked+0x80/0x8c [ 62.812298] Modules linked in: hidp rfcomm uinput hci_uart btqca bluetooth ecdh_generic mtk_scp mtk_rpmsg mtk_scp_ipi rpmsg_core bridge snd_seq_dummy stp llc snd_seq snd_seq_device lzo_rle lzo_compress nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp esp6 ah6 xfrm6_mode_tunnel xfrm6_mode_transport xfrm4_mode_tunnel xfrm4_mode_transport ip6t_REJECT ip6t_ipv6header zram ipt_MASQUERADE fuse ath10k_sdio ath10k_core ath mac80211 cfg80211 joydev [ 62.852227] CPU: 2 PID: 69 Comm: kworker/2:1 Tainted: G S 4.19.36 #344 [ 62.860057] Hardware name: MediaTek kukui rev1 board (DT) [ 62.865510] Workqueue: events l2cap_chan_timeout [bluetooth] [ 62.871177] pstate: 60000005 (nZCv daif -PAN -UAO) [ 62.875973] pc : refcount_sub_and_test_checked+0x80/0x8c [ 62.881285] lr : refcount_sub_and_test_checked+0x7c/0x8c [ 62.886594] sp : ffffff8008533cc0 [ 62.889907] x29: ffffff8008533cc0 x28: 0000000000000402 [ 62.895227] x27: ffffffaf37f16000 x26: ffffffe33b342d80 [ 62.900547] x25: ffffffe33aa2c210 x24: 0000000000000000 [ 62.905867] x23: ffffffe3294ef910 x22: ffffffe3294efc68 [ 62.911188] x21: ffffffe3294ef910 x20: ffffffe320fe3238 [ 62.916516] x19: ffffffe3294ed000 x18: 0000464806a32cd4 [ 62.921842] x17: 0000000000000400 x16: 0000000000000001 [ 62.927162] x15: 0000000000000000 x14: 0000000000000001 [ 62.932482] x13: 00000000000c001f x12: 0000000000000000 [ 62.937802] x11: 0000000000000001 x10: 0000000000000007 [ 62.943122] x9 : 97fe39c0a1baee00 x8 : 97fe39c0a1baee00 [ 62.948442] x7 : ffffffaf36af114c x6 : 0000000000000000 [ 62.953762] x5 : 0000000000000080 x4 : 0000000000000001 [ 62.959081] x3 : ffffff8008533868 x2 : 0000000000000006 [ 62.964401] x1 : ffffffe33b3436a0 x0 : 0000000000000000 [ 62.969721] Call trace: [ 62.972176] refcount_sub_and_test_checked+0x80/0x8c [ 62.977142] refcount_dec_and_test_checked+0x14/0x20 [ 62.982153] l2cap_sock_kill+0x40/0x58 [bluetooth] [ 62.986974] l2cap_sock_close_cb+0x1c/0x28 [bluetooth] [ 62.992140] l2cap_chan_timeout+0x94/0xb4 [bluetooth] [ 62.997196] process_one_work+0x330/0x65c [ 63.001206] worker_thread+0x2c8/0x3ec [ 63.004957] kthread+0x124/0x134 [ 63.008197] ret_from_fork+0x10/0x18 [ 63.011784] irq event stamp: 50638 [ 63.015203] hardirqs last enabled at (50637): [<ffffffaf37404fa4>] _raw_spin_unlock_irq+0x34/0x68 [ 63.024165] hardirqs last disabled at (50638): [<ffffffaf36a80e4c>] do_debug_exception+0x44/0x16c [ 63.033036] softirqs last enabled at (50632): [<ffffffaf36a81634>] __do_softirq+0x45c/0x4a4 [ 63.041473] softirqs last disabled at (50613): [<ffffffaf36abcf80>] irq_exit+0xd8/0xf8 [ 63.049386] ---[ end trace 91fdf7b9eddd3bb0 ]--- After analyzing the code, we noticed that there is a race condition between l2cap_chan_timeout() and l2cap_sock_release() while killing the socket. There are few more places in l2cap code where this race condition will occur. Issue is reproducible by writing a test which runs connect/disconnect of a bluetooth device in loop. With the help of this test, issue is consistently reproducible just within a couple of hours. To fix this, protect teardown/sock_kill and orphan/sock_kill by adding hold_lock on l2cap channel to ensure that the socket is killed only after it is marked as zapped and orphan. This change was tested by running the above test overnight and verifying that issue is not observed. There are places in sco and rfcomm code as well where similar race condition may occur. We are planning to send separate patches for them as well. Please let us know if this looks like a good generalized solution. Regards, Manish. Manish Mandlik (1): Bluetooth: Fix refcount use-after-free issue net/bluetooth/l2cap_core.c | 26 +++++++++++++++----------- net/bluetooth/l2cap_sock.c | 16 +++++++++++++--- 2 files changed, 28 insertions(+), 14 deletions(-) -- 2.25.0.341.g760bfbb309-goog