[Bug 99301] New: socket shutdown of L2CAP ERTM channel causes hung tasks when S or I frame ACK is pending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=99301

            Bug ID: 99301
           Summary: socket shutdown of L2CAP ERTM channel causes hung
                    tasks when S or I frame ACK is pending
           Product: Drivers
           Version: 2.5
    Kernel Version: verified on kernels 3.8 to 4.0.4 and bluetooth-next
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: Bluetooth
          Assignee: linux-bluetooth@xxxxxxxxxxxxxxx
          Reporter: Dean_Jenkins@xxxxxxxxxx
        Regression: No

We have managed to trigger a L2CAP ERTM hung task deadlock using the Bluez
l2test utility.

So far we have caused L2CAP ERTM deadlocks in:
Our own project running a i.MX6 using kernel 3.8 
ARM Raspberry Pi running Raspian kernel 3.18.11+
x86 Dell laptop running Mageia 4 kernel 3.14.32
PC running Fedora with kernel 4.0.4 (this was a test with an iPhone)
sanjc verified the failure occurs on bluetooth-next git repo.

I have sent the kernel.org Bluetooth maintainers instructions on how to trigger
the deadlock.
http://marc.info/?l=linux-bluetooth&m=143289425224598&w=2

The hung tasks look like the following traces, wait for about 10 minutes (need
to wait for 2 times the hung task detector period plus L2CAP timer expiry time)
after the initial hidden failure of __l2cap_wait_ack() occurs, dmesg shows:

>From ARM Raspberry Pi:
[  601.116119] INFO: task kworker/0:1:19 blocked for more than 120 seconds.
[  601.116156]       Not tainted 3.18.11+ #781
[  601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  601.116183] kworker/0:1     D c054cbcc     0    19      2 0x00000000
[  601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth]
[  601.116685] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
[  601.116727] [<c054cfa4>] (schedule) from [<c054d3c8>]
(schedule_preempt_disabled+0x30/0x40)
[  601.116812] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>]
(__mutex_lock_slowpath+0xb8/0x18c)
[  601.116856] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>]
(mutex_lock+0x38/0x3c)
[  601.117085] [<c054ef20>] (mutex_lock) from [<bf05173c>]
(l2cap_retrans_timeout+0x24/0x5c [bluetooth])
[  601.117371] [<bf05173c>] (l2cap_retrans_timeout [bluetooth]) from
[<c003aa68>] (process_one_work+0x13c/0x444)
[  601.117410] [<c003aa68>] (process_one_work) from [<c003b954>]
(worker_thread+0x190/0x580)
[  601.117445] [<c003b954>] (worker_thread) from [<c003fdb4>]
(kthread+0xd8/0xf4)
[  601.117488] [<c003fdb4>] (kthread) from [<c000e8c8>]
(ret_from_fork+0x14/0x20)
[  601.117569] INFO: task kworker/u3:2:327 blocked for more than 120 seconds.
[  601.117588]       Not tainted 3.18.11+ #781
[  601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  601.117614] kworker/u3:2    D c054cbcc     0   327      2 0x00000000
[  601.117806] Workqueue: hci0 hci_rx_work [bluetooth]
[  601.117941] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
[  601.117981] [<c054cfa4>] (schedule) from [<c054d3c8>]
(schedule_preempt_disabled+0x30/0x40)
[  601.118056] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>]
(__mutex_lock_slowpath+0xb8/0x18c)
[  601.118092] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>]
(mutex_lock+0x38/0x3c)
[  601.118307] [<c054ef20>] (mutex_lock) from [<bf04d2c8>]
(l2cap_get_chan_by_scid+0x28/0x98 [bluetooth])
[  601.118696] [<bf04d2c8>] (l2cap_get_chan_by_scid [bluetooth]) from
[<bf054590>] (l2cap_recv_frame+0xa4/0x2cec [bluetooth])
[  601.119146] [<bf054590>] (l2cap_recv_frame [bluetooth]) from [<bf058194>]
(l2cap_recv_acldata+0x214/0x22c [bluetooth])
[  601.119559] [<bf058194>] (l2cap_recv_acldata [bluetooth]) from [<bf02bf88>]
(hci_rx_work+0x180/0x288 [bluetooth])
[  601.119801] [<bf02bf88>] (hci_rx_work [bluetooth]) from [<c003aa68>]
(process_one_work+0x13c/0x444)
[  601.119839] [<c003aa68>] (process_one_work) from [<c003b954>]
(worker_thread+0x190/0x580)
[  601.119883] [<c003b954>] (worker_thread) from [<c003fdb4>]
(kthread+0xd8/0xf4)
[  601.119919] [<c003fdb4>] (kthread) from [<c000e8c8>]
(ret_from_fork+0x14/0x20)

>From x86 Mageia 4:
[15942.455042] INFO: task kworker/5:1:54 blocked for more than 120 seconds.
[15942.455049]       Tainted: G           O 3.14.32-desktop-1.mga4 #1
[15942.455051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[15942.455054] kworker/5:1     D ffff88022e352e00     0    54      2 0x00000000
[15942.455102] Workqueue: events l2cap_retrans_timeout [bluetooth]
[15942.455104]  ffff8800cb9dfd70 0000000000000046 ffff880223ec9800
ffff8800cb9dffd8
[15942.455106]  0000000000012e00 0000000000012e00 ffff880223ec9800
ffff8801a66b2390
[15942.455108]  ffff8801a66b2394 ffff880223ec9800 00000000ffffffff
ffff8801a66b2398
[15942.455110] Call Trace:
[15942.455115]  [<ffffffff81638969>] schedule_preempt_disabled+0x29/0x70
[15942.455117]  [<ffffffff8163a4a2>] __mutex_lock_slowpath+0x132/0x1b0
[15942.455119]  [<ffffffff8163a53f>] mutex_lock+0x1f/0x2f
[15942.455133]  [<ffffffffa07395d0>] l2cap_retrans_timeout+0x30/0x80
[bluetooth]
[15942.455137]  [<ffffffff8108166c>] process_one_work+0x17c/0x420
[15942.455139]  [<ffffffff81082281>] worker_thread+0x121/0x3b0
[15942.455141]  [<ffffffff81082160>] ? rescuer_thread+0x360/0x360
[15942.455144]  [<ffffffff81088922>] kthread+0xd2/0xf0
[15942.455145]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170
[15942.455148]  [<ffffffff8164488c>] ret_from_fork+0x7c/0xb0
[15942.455150]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170
[15942.455158] INFO: task kworker/u17:3:785 blocked for more than 120 seconds.
[15942.455159]       Tainted: G           O 3.14.32-desktop-1.mga4 #1
[15942.455160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[15942.455161] kworker/u17:3   D ffff88022e212e00     0   785      2 0x00000000
[15942.455172] Workqueue: hci0 hci_rx_work [bluetooth]
[15942.455173]  ffff880222ec7c10 0000000000000046 ffff88021dce0000
ffff880222ec7fd8
[15942.455175]  0000000000012e00 0000000000012e00 ffff88021dce0000
ffff8801b6421350
[15942.455176]  ffff8801b6421354 ffff88021dce0000 00000000ffffffff
ffff8801b6421358
[15942.455178] Call Trace:
[15942.455181]  [<ffffffff81638969>] schedule_preempt_disabled+0x29/0x70
[15942.455183]  [<ffffffff8163a4a2>] __mutex_lock_slowpath+0x132/0x1b0
[15942.455184]  [<ffffffff8163a53f>] mutex_lock+0x1f/0x2f
[15942.455198]  [<ffffffffa07346d7>] l2cap_get_chan_by_scid+0x27/0xa0
[bluetooth]
[15942.455212]  [<ffffffffa073e1be>] l2cap_recv_frame+0x3ae/0x29c0 [bluetooth]
[15942.455215]  [<ffffffff810915c8>] ? update_rq_clock.part.88+0x18/0x40
[15942.455229]  [<ffffffffa0741be3>] l2cap_recv_acldata+0x2c3/0x350 [bluetooth]
[15942.455240]  [<ffffffffa0715e96>] hci_rx_work+0x1b6/0x3c0 [bluetooth]
[15942.455242]  [<ffffffff8108166c>] process_one_work+0x17c/0x420
[15942.455244]  [<ffffffff81082281>] worker_thread+0x121/0x3b0
[15942.455246]  [<ffffffff81082160>] ? rescuer_thread+0x360/0x360
[15942.455248]  [<ffffffff81088922>] kthread+0xd2/0xf0
[15942.455250]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170
[15942.455252]  [<ffffffff8164488c>] ret_from_fork+0x7c/0xb0
[15942.455254]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170

To reproduce the failure do:

USING L2TEST
Raspberry Pi or x86 PC is the l2test client (unit under test)
laptop is the l2test server and has a BT_ADDR of <lap_BT_ADDR>

On the laptop use the command:
l2test -i hci0 -X ertm -d
This command listens for connections with optional L2CAP ERTM supported and
dumps out the received data.

On the RPi or 86 PC use the command:
l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 <lap_BT_ADDR>
This command sends 1024 bytes to the laptop via PSM 4113 used by l2test over an
ERTM channel

l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 5C:F3:70:68:BA:DD
l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64)
l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64)
l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3, handle 11,
class 0x000000, priority 0, rcvbuf 163840]
l2test[10366]: Sending ...
l2test[10366]: Closing channel ...

Using L2CAP ERTM, the channel is instructed to close but hangs.
l2test fails to terminate.

The defect is caused by __l2cap_wait_ack() when it waits for all pending L2CAP
ACKs (S or I frame) but takes all the locks whilst waiting (sleeping) so blocks
processing of received L2CAP frames and blocks the recovery L2CAP timers. It
causes a deadlock as __l2cap_wait_ack() runs in an infinite loop which
periodically sleeps every 200ms.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux