On Fri, Feb 11, 2011 at 5:07 PM, Andrei Warkentin <andreiw@xxxxxxxxxxxx> wrote: > Dear List, > > I've run into an interesting problem. Excuse me in advance if this was > already covered here, or for my explanations, since I'm not too > familiar with overall flow within BlueZ or Bluetooth specifics... > We've had some hardware config issues that resulted in garbage/malformed > messages arriving via H4 into the HCI layer. We've since resolved > these, but it got me thinking. The issues would result in certain HCI > messages being missed, including occasionally disconnect events being > missed, and a subsequent connect event would result in a double add. > > I was thinking about how to fix at the very least the crash. The sysfs > object is created as a last step after getting a "connection > completed" HCI message, I think. What I am unsure about is if it's > safe to just ignore the add if there is already a sysfs entry... > > So I would think the HCI core needs some resiliency against > bad/malignant bluetooth controllers, and perform error > recovery/resynchronization. Perhaps maybe there is room for a virtual > hci controller that just injects various message types to see how well > the core can cope? > > Thanks in advance, > A To further explain the issue, here is what was happening - 0) A BT device is paired. 1) Host goes into sleep mode. 2) BT device turns off. 3) Host wakes up due to BT waking the host. Due to UART resume issues, HCI message corrupted. hci_disconn_complete_evt never gets called. 4) BT device turns on. 5) devref gets incremented in hci_conn_complete_evt, and is now 2. 6) BT device turns off. hci_disconn_complete_evt is called, conn hash is deleted, but sysfs entry not cleaned up since atomic_dec_and_test(&conn->devref) != 0. 7) BT device turns on. sysfs add fails since it never was cleaned up. The attached patch takes care of that. I'm not too familiar with BlueZ (or bluetooth :-(), so I would like your feedback. In particular, I am unsure about sync connections. The primary issue overall is that HCI core doesn't handle HCI issues (whether caused by transport issues, or bad/malicious BT controller). I am curious if there are other ways to break the core. Thanks, A
From 436bf684f8c0b9c92ce7aa21af4f53fa5629bf94 Mon Sep 17 00:00:00 2001 From: Andrei Warkentin <andreiw@xxxxxxxxxxxx> Date: Sat, 12 Feb 2011 01:25:25 -0600 Subject: [PATCH] BlueZ: HCI: Be more resilient to HCI protocol problems. Do not corrupt kernel structs on connect message handling after a missed (due to HCI transport issues or bad BT controller) disconnect event message. Change-Id: I8f461068896f7497f1e0127ea22ae6f07ae876b7 Signed-off-by: Andrei Warkentin <andreiw@xxxxxxxxxxxx> --- net/bluetooth/hci_event.c | 23 +++++++++++++++++++---- 1 files changed, 19 insertions(+), 4 deletions(-) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index bbb4441..5bf51f0 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -895,8 +895,15 @@ static inline void hci_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *s } else conn->state = BT_CONNECTED; - hci_conn_hold_device(conn); - hci_conn_add_sysfs(conn); + /* We could have somehow not hci_conn_del-eted, due + to errors in the HCI transport. */ + if (atomic_read(&conn->devref) == 0) { + hci_conn_hold_device(conn); + hci_conn_add_sysfs(conn); + } else { + BT_ERR("connection to %s was never torn down", batostr(&ev->bdaddr)); + hci_proto_disconn_cfm(conn, 0x16); + } if (test_bit(HCI_AUTH, &hdev->flags)) conn->link_mode |= HCI_LM_AUTH; @@ -1697,8 +1704,16 @@ static inline void hci_sync_conn_complete_evt(struct hci_dev *hdev, struct sk_bu conn->handle = __le16_to_cpu(ev->handle); conn->state = BT_CONNECTED; - hci_conn_hold_device(conn); - hci_conn_add_sysfs(conn); + /* We could have somehow not hci_conn_del-eted, due + to errors in the HCI transport. */ + if (atomic_read(&conn->devref) == 0) { + hci_conn_hold_device(conn); + hci_conn_add_sysfs(conn); + } else { + BT_ERR("sync connection to %s was never torn down", batostr(&ev->bdaddr)); + hci_proto_disconn_cfm(conn, 0x16); + } + break; case 0x10: /* Connection Accept Timeout */ -- 1.7.0.4