https://bugzilla.kernel.org/show_bug.cgi?id=99301 Bug ID: 99301 Summary: socket shutdown of L2CAP ERTM channel causes hung tasks when S or I frame ACK is pending Product: Drivers Version: 2.5 Kernel Version: verified on kernels 3.8 to 4.0.4 and bluetooth-next Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: blocking Priority: P1 Component: Bluetooth Assignee: linux-bluetooth@xxxxxxxxxxxxxxx Reporter: Dean_Jenkins@xxxxxxxxxx Regression: No We have managed to trigger a L2CAP ERTM hung task deadlock using the Bluez l2test utility. So far we have caused L2CAP ERTM deadlocks in: Our own project running a i.MX6 using kernel 3.8 ARM Raspberry Pi running Raspian kernel 3.18.11+ x86 Dell laptop running Mageia 4 kernel 3.14.32 PC running Fedora with kernel 4.0.4 (this was a test with an iPhone) sanjc verified the failure occurs on bluetooth-next git repo. I have sent the kernel.org Bluetooth maintainers instructions on how to trigger the deadlock. http://marc.info/?l=linux-bluetooth&m=143289425224598&w=2 The hung tasks look like the following traces, wait for about 10 minutes (need to wait for 2 times the hung task detector period plus L2CAP timer expiry time) after the initial hidden failure of __l2cap_wait_ack() occurs, dmesg shows: >From ARM Raspberry Pi: [ 601.116119] INFO: task kworker/0:1:19 blocked for more than 120 seconds. [ 601.116156] Not tainted 3.18.11+ #781 [ 601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.116183] kworker/0:1 D c054cbcc 0 19 2 0x00000000 [ 601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth] [ 601.116685] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c) [ 601.116727] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40) [ 601.116812] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c) [ 601.116856] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c) [ 601.117085] [<c054ef20>] (mutex_lock) from [<bf05173c>] (l2cap_retrans_timeout+0x24/0x5c [bluetooth]) [ 601.117371] [<bf05173c>] (l2cap_retrans_timeout [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444) [ 601.117410] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580) [ 601.117445] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4) [ 601.117488] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20) [ 601.117569] INFO: task kworker/u3:2:327 blocked for more than 120 seconds. [ 601.117588] Not tainted 3.18.11+ #781 [ 601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.117614] kworker/u3:2 D c054cbcc 0 327 2 0x00000000 [ 601.117806] Workqueue: hci0 hci_rx_work [bluetooth] [ 601.117941] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c) [ 601.117981] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40) [ 601.118056] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c) [ 601.118092] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c) [ 601.118307] [<c054ef20>] (mutex_lock) from [<bf04d2c8>] (l2cap_get_chan_by_scid+0x28/0x98 [bluetooth]) [ 601.118696] [<bf04d2c8>] (l2cap_get_chan_by_scid [bluetooth]) from [<bf054590>] (l2cap_recv_frame+0xa4/0x2cec [bluetooth]) [ 601.119146] [<bf054590>] (l2cap_recv_frame [bluetooth]) from [<bf058194>] (l2cap_recv_acldata+0x214/0x22c [bluetooth]) [ 601.119559] [<bf058194>] (l2cap_recv_acldata [bluetooth]) from [<bf02bf88>] (hci_rx_work+0x180/0x288 [bluetooth]) [ 601.119801] [<bf02bf88>] (hci_rx_work [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444) [ 601.119839] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580) [ 601.119883] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4) [ 601.119919] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20) >From x86 Mageia 4: [15942.455042] INFO: task kworker/5:1:54 blocked for more than 120 seconds. [15942.455049] Tainted: G O 3.14.32-desktop-1.mga4 #1 [15942.455051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [15942.455054] kworker/5:1 D ffff88022e352e00 0 54 2 0x00000000 [15942.455102] Workqueue: events l2cap_retrans_timeout [bluetooth] [15942.455104] ffff8800cb9dfd70 0000000000000046 ffff880223ec9800 ffff8800cb9dffd8 [15942.455106] 0000000000012e00 0000000000012e00 ffff880223ec9800 ffff8801a66b2390 [15942.455108] ffff8801a66b2394 ffff880223ec9800 00000000ffffffff ffff8801a66b2398 [15942.455110] Call Trace: [15942.455115] [<ffffffff81638969>] schedule_preempt_disabled+0x29/0x70 [15942.455117] [<ffffffff8163a4a2>] __mutex_lock_slowpath+0x132/0x1b0 [15942.455119] [<ffffffff8163a53f>] mutex_lock+0x1f/0x2f [15942.455133] [<ffffffffa07395d0>] l2cap_retrans_timeout+0x30/0x80 [bluetooth] [15942.455137] [<ffffffff8108166c>] process_one_work+0x17c/0x420 [15942.455139] [<ffffffff81082281>] worker_thread+0x121/0x3b0 [15942.455141] [<ffffffff81082160>] ? rescuer_thread+0x360/0x360 [15942.455144] [<ffffffff81088922>] kthread+0xd2/0xf0 [15942.455145] [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170 [15942.455148] [<ffffffff8164488c>] ret_from_fork+0x7c/0xb0 [15942.455150] [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170 [15942.455158] INFO: task kworker/u17:3:785 blocked for more than 120 seconds. [15942.455159] Tainted: G O 3.14.32-desktop-1.mga4 #1 [15942.455160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [15942.455161] kworker/u17:3 D ffff88022e212e00 0 785 2 0x00000000 [15942.455172] Workqueue: hci0 hci_rx_work [bluetooth] [15942.455173] ffff880222ec7c10 0000000000000046 ffff88021dce0000 ffff880222ec7fd8 [15942.455175] 0000000000012e00 0000000000012e00 ffff88021dce0000 ffff8801b6421350 [15942.455176] ffff8801b6421354 ffff88021dce0000 00000000ffffffff ffff8801b6421358 [15942.455178] Call Trace: [15942.455181] [<ffffffff81638969>] schedule_preempt_disabled+0x29/0x70 [15942.455183] [<ffffffff8163a4a2>] __mutex_lock_slowpath+0x132/0x1b0 [15942.455184] [<ffffffff8163a53f>] mutex_lock+0x1f/0x2f [15942.455198] [<ffffffffa07346d7>] l2cap_get_chan_by_scid+0x27/0xa0 [bluetooth] [15942.455212] [<ffffffffa073e1be>] l2cap_recv_frame+0x3ae/0x29c0 [bluetooth] [15942.455215] [<ffffffff810915c8>] ? update_rq_clock.part.88+0x18/0x40 [15942.455229] [<ffffffffa0741be3>] l2cap_recv_acldata+0x2c3/0x350 [bluetooth] [15942.455240] [<ffffffffa0715e96>] hci_rx_work+0x1b6/0x3c0 [bluetooth] [15942.455242] [<ffffffff8108166c>] process_one_work+0x17c/0x420 [15942.455244] [<ffffffff81082281>] worker_thread+0x121/0x3b0 [15942.455246] [<ffffffff81082160>] ? rescuer_thread+0x360/0x360 [15942.455248] [<ffffffff81088922>] kthread+0xd2/0xf0 [15942.455250] [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170 [15942.455252] [<ffffffff8164488c>] ret_from_fork+0x7c/0xb0 [15942.455254] [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170 To reproduce the failure do: USING L2TEST Raspberry Pi or x86 PC is the l2test client (unit under test) laptop is the l2test server and has a BT_ADDR of <lap_BT_ADDR> On the laptop use the command: l2test -i hci0 -X ertm -d This command listens for connections with optional L2CAP ERTM supported and dumps out the received data. On the RPi or 86 PC use the command: l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 <lap_BT_ADDR> This command sends 1024 bytes to the laptop via PSM 4113 used by l2test over an ERTM channel l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 5C:F3:70:68:BA:DD l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64) l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64) l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3, handle 11, class 0x000000, priority 0, rcvbuf 163840] l2test[10366]: Sending ... l2test[10366]: Closing channel ... Using L2CAP ERTM, the channel is instructed to close but hangs. l2test fails to terminate. The defect is caused by __l2cap_wait_ack() when it waits for all pending L2CAP ACKs (S or I frame) but takes all the locks whilst waiting (sleeping) so blocks processing of received L2CAP frames and blocks the recovery L2CAP timers. It causes a deadlock as __l2cap_wait_ack() runs in an infinite loop which periodically sleeps every 200ms. -- You are receiving this mail because: You are the assignee for the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html