Re: [PATCH] Bluetooth-next: Add incremental indexing in sysfs HCI connection name.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David, Doron & Marcel,

On Thu, 2011-08-18 at 07:44 -0400, David Herrmann wrote:
> On Thu, Aug 18, 2011 at 1:21 PM, Keren, Doron <doronkeren@xxxxxx> wrote:
> > Hi Marcel
> >
> >> -----Original Message-----
> >> From: Marcel Holtmann [mailto:marcel@xxxxxxxxxxxx]
> >> Sent: Thursday, August 18, 2011 12:42 AM
> >> To: doron.keren.bluez@xxxxxxxxx
> >> Cc: linux-bluetooth@xxxxxxxxxxxxxxx; Keren, Doron; Ilia, Kolominsky
> >> Subject: Re: [PATCH] Bluetooth-next: Add incremental indexing in sysfs HCI
> >> connection name.
> >>
> >> Hi Doron,
> >>
> >> > The patch fixes kernel panic which is due to race condition
> >> > between the setup of incomming connection and clean-up of the
> >> > dead one. Observed in the following case: attached HID device
> >> > disconnects unexpectedly (without performing ACL disconnect ),
> >> > the device tries to connect again before the ACL link time-out
> >> > fires, this translates to the HCI_DISCONNECT, HCI_CONNECT_REQ
> >> > events on the same handle, since HCI_DISCONNECT trigers the clean
> >> > up of the HID device and handled in different context, the
> >> > linking/unlinking connection object to sysfs, may mess up.
> >> >
> >> > Signed-off-by: Ilia Kolominsky <iliak@xxxxxx>
> >> > ---
> >> >  net/bluetooth/hci_sysfs.c |    4 +++-
> >> >  1 files changed, 3 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
> >> > index a6c3aa8..5967d63 100644
> >> > --- a/net/bluetooth/hci_sysfs.c
> >> > +++ b/net/bluetooth/hci_sysfs.c
> >> > @@ -9,6 +9,7 @@
> >> >  #include <net/bluetooth/bluetooth.h>
> >> >  #include <net/bluetooth/hci_core.h>
> >> >
> >> > +static int acl_conn_index = 0;
> >> >  static struct class *bt_class;
> >> >
> >> >  struct dentry *bt_debugfs;
> >> > @@ -91,7 +92,8 @@ static void add_conn(struct work_struct *work)
> >> >     struct hci_conn *conn = container_of(work, struct hci_conn,
> >> work_add);
> >> >     struct hci_dev *hdev = conn->hdev;
> >> >
> >> > -   dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
> >> > +   acl_conn_index++;
> >> > +   dev_set_name(&conn->dev, "%s:%d:%d", hdev->name, conn->handle,
> >> acl_conn_index);
> >> >
> >> >     dev_set_drvdata(&conn->dev, conn);
> >>
> >> can we get a bit more of details on what this is actually trying to
> >> solve. I do not like this way of solving it at all. I think it is trying
> >> to cover up symptoms and not fixing the real issue.
> >>
> >> Regards
> >>
> >> Marcel
> >>
> >
> > The scenario that causes the kernel panic Happens when HID device disconnect and connect fast. When HID device disconnects the name clean-up appears 100-300msec after the "hci_disconn_complete_evt()", because the two L2CAP channels need to be closed first.

Doron,
It's not because the L2CAP channels need closing - the channels are
closed within the hci_disconn_complete_evt() call tree:
   hci_disconn_complete_evt
      hci_proto_disconn_cfm
         l2cap_disconn_cfm
            l2cap_conn_del
               l2cap_chan_del
The sk->sk_state_change() in l2cap_chan_del wakes up the khid thread
which tests the sk_state and exits its processing loop, eventually
reaching the hci_conn_put_device as the thread shuts down.

The reason for the delay in deleting the sysfs device is that the child
HID device needs to be destroyed first to preserve the uevent order to
user-space (which is the purpose of hci_conn_hold_device/_put_device).

> > The problem is that the base-band has already cleaned the handle number when the "hci_disconn_complete_evt()" sent. If another connection is initiating right after, the base-band will send "hci_conn_request_evt()" with the same handle number. During this time we have situation that two HCI connections has the same name, because the handle number from the base-band is the same. There is no reason for the two HCI connections to share the same resource, name. This duplicate name situation will cause Kernel panic.
> >
> > The real issue is that the HCI device connection name is saved in the SYSFS
> > In the format: "/devices/virtual/bluetooth/hci0/hci0:1"
> > The name in this format depends just on the base-band handle that received in the "hci_conn_complete_evt()". The device name is cleaned just when the
> > Variable conn->devref becomes 0.
> > In the source code:
> >    if (atomic_dec_and_test(&conn->devref))
> >                hci_conn_del_sysfs(conn);
> >
> > The incremental index in the name format: "/devices/virtual/bluetooth/hci0/hci0:1:<Index>"
> > Solves the problem of two HCI connections with the same name.

This is the simplest solution. The other way - which is more complicated
but preserves naming - is to do away with the
hci_conn_hold_device/_put_device, and instead, correctly sequence HID
child device destruction within the del_conn() work queue handler -
basically impose what the device model should already do.

> I can confirm that there is a reconnection issue and I get the same
> oops (device "hci0:1" is tried to be added twice) on fast
> reconnections.
> 
> However, I tested this fix and Marcel seems right. I no longer get the
> sysfs-oops, but I now get another oops in hidp_add_connection() so
> this doesn't fix the problem.
> After restarting the machine after this oops, my log contains only
> half of the oops-message so I can't post this here. I will try to get
> a full log and report it then.

David,
I know you already posted a patch for this (which looks ok for fixing
the oops specifically), but it'd still be helpful to see a debug kernel
log that led up to that. There's only a couple of reasons -- all bad --
why the hci connection could not be found in the connection list while
the ctrl & intr sockets are connected (which are tested just prior to
hidp_add_connection).

Marcel,
It seems to me that hidp_get_device, rfcomm_get_device & bnep_get_device
are unsafe. Elsewhere, concurrent access to the conn_hash list is
protected by acquiring the device lock (hci_dev_lock/_unlock and _bh
variants). Can you confirm my understanding here?

Regards,
Peter Hurley
��.n��������+%������w��{.n�����{����^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux