Hi Marcel, This is my 3rd patchset in my series of resolving a design flaw in hci_uart_tty_close() related to the proper closure of the Data Link protocol layer which can result in a kernel crash. This patchset contains a single patch that adds rwlocking around the "proto" function pointers. This provides a full fix to prevent a potential kernel crash triggered by hu->proto->close() running before or during the execution of a non-close "proto" function pointer. Previous Discussions ==================== Please refer to the discussion on my 2nd patchset V1 that can be found here: https://www.spinics.net/lists/linux-bluetooth/msg70370.html You accepted the 3 patches of my patchset of commits namely: Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() drivers/bluetooth/hci_ldisc.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) Please refer to the discussion on my 1st patchset V3 that can be found here: https://www.spinics.net/lists/linux-bluetooth/msg70315.html You accepted the 3 patches of my patchset of commits namely: Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY drivers/bluetooth/hci_ldisc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) Please refer to the discussion on my 1st patchset V2 that can be found here: https://www.spinics.net/lists/linux-bluetooth/msg70183.html Please refer to the discussion on my 1st RFC patchset V1 that can be found here: https://www.spinics.net/lists/linux-bluetooth/msg70077.html Analysis ======== There are multiple threads that use "proto" function pointers that potentially can be running concurrently with the thread of hci_uart_tty_close(). The threads include (not a complete list): a) hci_uart_write_work() [hci_ldisc.c] which is executed by the work queue hu->write_work which creates the callstack: hci_uart_write_work() hci_uart_dequeue() hu->proto->dequeue() b) hci_tx_work() [hci_core.c] which is executed by the work queue hdev->tx_work which creates the callstack: hci_tx_work() hci_send_frame() hdev->send() hci_uart_send_frame() hu->proto->enqueue() then calls hci_uart_tx_wakeup() schedule_work(&hu->write_work) triggering a) c) hci_cmd_work() [hci_core.c] which is executed by the work queue hdev->cmd_work which creates the callstack: hci_cmd_work() hci_send_frame() hdev->send() hci_uart_send_frame() hu->proto->enqueue() then calls hci_uart_tx_wakeup() schedule_work(&hu->write_work) triggering a) d) hci_dev_close() [hci_core.c] creates the callstack: hci_dev_close() hci_dev_do_close() hdev->flush() hci_uart_flush() hu->proto->flush() e) When BCSP is used and a retransmission is needed: bcsp_timed_event() [hci_bcsp.c] creates the callstack: bcsp_timed_event() hci_uart_tx_wakeup() schedule_work(&hu->write_work) triggering a) hci_ldisc.c: hci_uart_tty_close() currently has the construction: cancel_work_sync(&hu->write_work); if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags)) { if (hdev) { if (test_bit(HCI_UART_REGISTERED, &hu->flags)) hci_unregister_dev(hdev); hci_free_dev(hdev); } hu->proto->close(hu); } At first glance, the cancel_work_sync(&hu->write_work) would seem to inhibit hci_uart_write_work() from running. Actually, the call to cancel_work_sync() can be ineffective because: a) hci_unregister_dev() calls hci_dev_do_close() and if HCI_UP flag is in the set state then it is possible for hci_dev_do_close() to send a HCI command. Such as sending a HCI RESET command which causes hci_send_frame() to schedule hci_uart_write_work() after the cancel_work_sync() has executed. b) In the case of BCSP retransmissions, the callstack in e) runs periodically every 250ms. This can occur before and after the cancel_work_sync() executed. The call to hu->proto->close(hu) is partially protected by the flag HCI_UART_PROTO_READY but there is a small race condition. A race condition exists because hci_uart_tty_close() can clear HCI_UART_PROTO_READY just after the concurrent thread(s) detected the flag as set. The race potentially allows the concurrent thread(s) to run a "proto" function pointer after or during hci_uart_tty_close() is calling hu->proto->close(hu) which could trigger a kernel crash. Bear in mind that on systems with low resources, such as embedded systems, that threads can get delayed by interrupts and pre-emption. This means that a concurrent thread might sleep after it checks the HCI_UART_PROTO_READY flag. Therefore, just using the flag HCI_UART_PROTO_READY is not 100% safe from introducing race conditions. Note that the elapsed run-time for hci_unregister_dev() is dependent on a number of scenarios. This means hci_unregister_dev() can quickly return or take up to 10 seconds (guess) to return. This means that the elapsed time between clearing HCI_UART_PROTO_READY and executing hu->proto->close(hu) is variable and therefore there is a variable risk in triggering the kernel crash. Solution implemented by this patchset ===================================== Modify hci_ldisc.c: hci_uart_tty_close() to use a rwlock solution: cancel_work_sync(&hu->write_work); if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) { write_lock_irqsave(&hu->proto_lock, flags); clear_bit(HCI_UART_PROTO_READY, &hu->flags); write_unlock_irqrestore(&hu->proto_lock, flags); if (hdev) { if (test_bit(HCI_UART_REGISTERED, &hu->flags)) hci_unregister_dev(hdev); hci_free_dev(hdev); } hu->proto->close(hu); } The call to write_lock_irqsave(&hu->proto_lock, flags) will block when a concurrent thread has called read_lock(&hu->proto_lock). This ensures that the flag HCI_UART_PROTO_READY is only cleared when no "proto" function pointers are running in the concurrent threads. Therefore, it is not possible for any "proto" function pointers to be running immediately before, during or after hu->proto->close(hu) is called. This means the race condition is avoided. Note that this solution will inhibit the Data Link layer protocol from independently retransmitting because hu->proto->enqueue() cannot be called after HCI_UART_PROTO_READY has been put into the clear state. Test results from running the patchset ====================================== A x86 64-bit test kernel was created based on kernel v4.11.0 plus the 7 patches: 3rd Patchset Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races 2nd Patchset Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() 1st Patchset Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY The test used btattach with BCSP which fails to establish a link because BCSP is not fully implemented in the kernel. However, this is ideal in forcing BCSP retransmissions. btattach was killed after 5 seconds of running to force hci_uart_tty_close() to execute. Here is a snippet of a dmesg log with dynamic debug enabled: [ 1039.415914] hci_uart_tty_close: tty ffff9861c691d800 [ 1039.415917] hci_uart_close: hdev ffff98625b422000 [ 1039.415920] hci_uart_flush: hdev ffff98625b422000 tty ffff9861c691d800 [ 1039.415922] bcsp_flush: hu ffff9861c27bd480 [ 1039.415928] hci_unregister_dev: ffff98625b422000 name hci1 bus 3 [ 1039.418767] bcsp_timed_event: hu ffff9861c27bd480 retransmitting 2 pkts [ 1039.674810] Bluetooth: hci1 command 0x1001 tx timeout [ 1039.674827] hci_cmd_work: hci1 cmd_cnt 1 cmd queued 1 [ 1039.674832] hci_send_frame: hci1 type 1 len 3 [ 1039.674836] hci_uart_send_frame: hci1: type 1 len 3 [ 1039.674838] Bluetooth: hci1 sending frame failed (-49) [ 1041.722773] Bluetooth: hci1 command 0x1009 tx timeout [ 1041.722811] hci_cmd_work: hci1 cmd_cnt 1 cmd queued 0 [ 1045.818847] hci_uart_close: hdev ffff98625b422000 [ 1045.818850] hci_uart_flush: hdev ffff98625b422000 tty ffff9861c691d800 [ 1045.818868] hci_dev_do_close: hci1 ffff98625b422000 [ 1045.819031] hci_conn_params_clear_all: All LE connection parameters were removed [ 1045.819036] bcsp_close: hu ffff9861c27bd480 The following can be observed from the results: 1. hci_unregister_dev() ran which means HCI_UART_PROTO_READY has been cleared. 2. bcsp_timed_event() runs after HCI_UART_PROTO_READY has been cleared and so hci_uart_tx_wakeup() no longer appears in the log. This inhibits further BCSP retransmission attempts. 3. Bluetooth: hci1 command 0x1001 tx timeout This occurs because BCSP was unable to successfully transfer the HCI command. BCSP fails because the data plane to and from the UART driver is no longer working. 4. Bluetooth: hci1 sending frame failed (-49) This occurs because HCI_UART_PROTO_READY has been cleared which inhibits a new HCI command being sent from hci_cmd_work(). 5. Bluetooth: hci1 command 0x1009 tx timeout BCSP never took the HCI command but the HCI core still performs the HCI command timeout. This means the thread gets blocked for 2 seconds for doing nothing. 6. hci_uart_close() runs twice which seems to be unnecessary. This causes hci_uart_flush() to run twice but notice the 2nd execution of hci_uart_flush() does not run bcsp_flush() because hdev->flush gets set to NULL in hci_uart_close(). This suggests a workaround was done in the past to prevent a kernel crash. Outline of potential further work ================================= With this patchset, the design flaw in hci_uart_tty_close() is further exposed. It can be seen that hci_uart_tty_close() is systematically preventing communications with the Bluetooth Radio Module. But hci_uart_tty_close() calls hci_unregister_dev() which calls hci_dev_do_close(). If the flag HCI_UP is in the set state then hci_dev_do_close() assumes that good communications with the Bluetooth Radio Module are still possible. Therefore, there is a conflict in the design because the design should not be closing the link and trying to use the link to send HCI commands at the same time. This can trigger the protocol error handling procedures to occur which are unsuccessful. In addition, it is suspected that before hci_uart_tty_close() runs, the TTY layer has cut the data plane between the Data Link protocol and the UART driver. This means that if hci_uart_tty_close() is rearranged to allow HCI_UART_PROTO_READY to be cleared after hci_unregister_dev() runs then any HCI command will still fail to reach the Bluetooth Radio Module. There are other issues in hci_ldisc.c: hci_uart_tty_close() /* Detach from the tty */ tty->disc_data = NULL; if (!hu) return; hdev = hu->hdev; if (hdev) hci_uart_close(hdev); It can be seen that tty->disc_data is set to NULL. This causes some of the HCI UART LDISC functions to not perform any operation such as the functions: hci_uart_tty_wakeup() hci_uart_tty_receive() hci_uart_tty_ioctl() Doing "tty->disc_data = NULL" sabotages the communications with the Bluetooth Radio Module. Therefore, this statement needs to be moved to after the communications are going to be cleanly closed such as after hci_unregister_dev() has completed. Calling hci_uart_close(hdev) is likely to interfere with the communications with the Bluetooth Radio Module. Therefore, this statement needs to be moved to after the communications are going to be cleanly closed such as after hci_unregister_dev() has completed. Potential solution to the design flaw ===================================== Suspect that a solution would be to get hci_uart_tty_close() to run hci_unregister_dev() before tearing down the resources needed by the Data Link protocol layer. The patch was rebased onto the bluetooth-next master branch HEAD commit: 2da711a Bluetooth: Skip vendor diagnostic configuration for HCI User Channel Dean Jenkins (1): Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races drivers/bluetooth/hci_ldisc.c | 40 +++++++++++++++++++++++++++++++++++----- drivers/bluetooth/hci_uart.h | 1 + 2 files changed, 36 insertions(+), 5 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html