Hi David, Doron & Marcel, On Thu, 2011-08-18 at 07:44 -0400, David Herrmann wrote: > On Thu, Aug 18, 2011 at 1:21 PM, Keren, Doron <doronkeren@xxxxxx> wrote: > > Hi Marcel > > > >> -----Original Message----- > >> From: Marcel Holtmann [mailto:marcel@xxxxxxxxxxxx] > >> Sent: Thursday, August 18, 2011 12:42 AM > >> To: doron.keren.bluez@xxxxxxxxx > >> Cc: linux-bluetooth@xxxxxxxxxxxxxxx; Keren, Doron; Ilia, Kolominsky > >> Subject: Re: [PATCH] Bluetooth-next: Add incremental indexing in sysfs HCI > >> connection name. > >> > >> Hi Doron, > >> > >> > The patch fixes kernel panic which is due to race condition > >> > between the setup of incomming connection and clean-up of the > >> > dead one. Observed in the following case: attached HID device > >> > disconnects unexpectedly (without performing ACL disconnect ), > >> > the device tries to connect again before the ACL link time-out > >> > fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ > >> > events on the same handle, since HCI_DISCONNECT trigers the clean > >> > up of the HID device and handled in different context, the > >> > linking/unlinking connection object to sysfs, may mess up. > >> > > >> > Signed-off-by: Ilia Kolominsky <iliak@xxxxxx> > >> > --- > >> > net/bluetooth/hci_sysfs.c | 4 +++- > >> > 1 files changed, 3 insertions(+), 1 deletions(-) > >> > > >> > diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c > >> > index a6c3aa8..5967d63 100644 > >> > --- a/net/bluetooth/hci_sysfs.c > >> > +++ b/net/bluetooth/hci_sysfs.c > >> > @@ -9,6 +9,7 @@ > >> > #include <net/bluetooth/bluetooth.h> > >> > #include <net/bluetooth/hci_core.h> > >> > > >> > +static int acl_conn_index = 0; > >> > static struct class *bt_class; > >> > > >> > struct dentry *bt_debugfs; > >> > @@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work) > >> > struct hci_conn *conn = container_of(work, struct hci_conn, > >> work_add); > >> > struct hci_dev *hdev = conn->hdev; > >> > > >> > - dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle); > >> > + acl_conn_index++; > >> > + dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle, > >> acl_conn_index); > >> > > >> > dev_set_drvdata(&conn->dev, conn); > >> > >> can we get a bit more of details on what this is actually trying to > >> solve. I do not like this way of solving it at all. I think it is trying > >> to cover up symptoms and not fixing the real issue. > >> > >> Regards > >> > >> Marcel > >> > > > > The scenario that causes the kernel panic Happens when HID device disconnect and connect fast. When HID device disconnects the name clean-up appears 100-300msec after the "hci_disconn_complete_evt()", because the two L2CAP channels need to be closed first. Doron, It's not because the L2CAP channels need closing - the channels are closed within the hci_disconn_complete_evt() call tree: hci_disconn_complete_evt hci_proto_disconn_cfm l2cap_disconn_cfm l2cap_conn_del l2cap_chan_del The sk->sk_state_change() in l2cap_chan_del wakes up the khid thread which tests the sk_state and exits its processing loop, eventually reaching the hci_conn_put_device as the thread shuts down. The reason for the delay in deleting the sysfs device is that the child HID device needs to be destroyed first to preserve the uevent order to user-space (which is the purpose of hci_conn_hold_device/_put_device). > > The problem is that the base-band has already cleaned the handle number when the "hci_disconn_complete_evt()" sent. If another connection is initiating right after, the base-band will send "hci_conn_request_evt()" with the same handle number. During this time we have situation that two HCI connections has the same name, because the handle number from the base-band is the same. There is no reason for the two HCI connections to share the same resource, name. This duplicate name situation will cause Kernel panic. > > > > The real issue is that the HCI device connection name is saved in the SYSFS > > In the format: "/devices/virtual/bluetooth/hci0/hci0:1" > > The name in this format depends just on the base-band handle that received in the "hci_conn_complete_evt()". The device name is cleaned just when the > > Variable conn->devref becomes 0. > > In the source code: > > if (atomic_dec_and_test(&conn->devref)) > > hci_conn_del_sysfs(conn); > > > > The incremental index in the name format: "/devices/virtual/bluetooth/hci0/hci0:1:<Index>" > > Solves the problem of two HCI connections with the same name. This is the simplest solution. The other way - which is more complicated but preserves naming - is to do away with the hci_conn_hold_device/_put_device, and instead, correctly sequence HID child device destruction within the del_conn() work queue handler - basically impose what the device model should already do. > I can confirm that there is a reconnection issue and I get the same > oops (device "hci0:1" is tried to be added twice) on fast > reconnections. > > However, I tested this fix and Marcel seems right. I no longer get the > sysfs-oops, but I now get another oops in hidp_add_connection() so > this doesn't fix the problem. > After restarting the machine after this oops, my log contains only > half of the oops-message so I can't post this here. I will try to get > a full log and report it then. David, I know you already posted a patch for this (which looks ok for fixing the oops specifically), but it'd still be helpful to see a debug kernel log that led up to that. There's only a couple of reasons -- all bad -- why the hci connection could not be found in the connection list while the ctrl & intr sockets are connected (which are tested just prior to hidp_add_connection). Marcel, It seems to me that hidp_get_device, rfcomm_get_device & bnep_get_device are unsafe. Elsewhere, concurrent access to the conn_hash list is protected by acquiring the device lock (hci_dev_lock/_unlock and _bh variants). Can you confirm my understanding here? Regards, Peter Hurley ��.n��������+%������w��{.n�����{����^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�