Re: can we disable/enable eSCO by using setsockopt of sco socket?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Marcel,

I've merged the 2.6.33-rc3 into our code base and made extensive tests
in the past 2 days. The problem is still there if I tried long enough.
But it could be my own problem because the merge is manually
conducted. Our code is 2.6.29 and has a couple of changes so I have to
add the commit 9eba32b86d17ef87131fa0bce43c614904ab5781 with some
minor tailoring.

However, I doubt the duplicate sysfs entry doesn't lead to our kernel
panic. I revert the code back to official 2.6.30 where retry logic was
first introduced and simply add the error code 0x10(Connection Accept
Timeout Exceeded) for retry(see modified code snippet at the end) on
the code snippet for retrying SCO right now. By using hcitool to
monitor the connections, I found the code will eventually result in 3
connection handles while the eSCO one seems to be invalid.

# /data/local/hcitool con
Connections:
< eSCO 00:50:CD:20:BA:E6 handle 0 state 5 lm SLAVE  (note the handle
and the state)
< SCO 00:50:CD:20:BA:E6 handle 513 state 1 lm SLAVE
> ACL 00:50:CD:20:BA:E6 handle 1 state 1 lm MASTER AUTH ENCRYPT

While the phenomenon appears due to error 0x10 and with our code, I am
wondering the same will happen if the hci_sync_conn_complete_evt() is
called with error 0x1c and 0x1f in the official code. As you can see,
the snippet just skips hci_conn_del(conn) and goto unlock:. With the
invalid eSCO handle, when I power off bluetooth, sometimes a kernel
panic will happen(the panic backtrace appended at the end of the
message,too). Now my question is that shall we call hci_conn_del(conn)
for the failed eSCO attempt before  retry SCO?

Regarding extend the SCO socket parameters to allow specifying a
eSCO/SCO packet mask, do you want us to provide a patch as well?

Regards,
Liang Bao

<<the backtrace of the kernel panic>>

<4>[ 7691.791656] Backtrace:

<4>[ 7691.794799] [<c03663f8>] (hci_send_to_sock+0x0/0x17c) from
[<c0362078>] (hci_si_event+0xd0/0xe0)

<4>[ 7691.805480] [<c0361fa8>] (hci_si_event+0x0/0xe0) from
[<c03657e8>] (hci_sock_dev_event+0x38/0x148)

<4>[ 7691.816040] [<c03657b0>] (hci_sock_dev_event+0x0/0x148) from
[<c007caec>] (notifier_call_chain+0x34/0x78)

<4>[ 7691.827545] [<c007cab8>] (notifier_call_chain+0x0/0x78) from
[<c007d104>] (__atomic_notifier_call_chain+0x34/0x60)

<4>[ 7691.839843] [<c007d0d0>] (__atomic_notifier_call_chain+0x0/0x60)
from [<c007d150>] (atomic_notifier_call_chain+0x20/0x28)

<4>[ 7691.852783]  r5:c96e1218 r4:c96e1000

<4>[ 7691.857666] [<c007d130>] (atomic_notifier_call_chain+0x0/0x28)
from [<c035e9a8>] (hci_notify+0x18/0x20)

<4>[ 7691.868713] [<c035e990>] (hci_notify+0x0/0x20) from [<c035ec24>]
(hci_dev_do_close+0xcc/0x1f0)

<4>[ 7691.879180] [<c035eb58>] (hci_dev_do_close+0x0/0x1f0) from
[<c035f4d4>] (hci_dev_close+0x20/0x60)

<4>[ 7691.889953]  r7:400448ca r6:00000000 r5:c62c4c00 r4:c96e1000

<4>[ 7691.897369] [<c035f4b4>] (hci_dev_close+0x0/0x60) from
[<c03660cc>] (hci_sock_ioctl+0x100/0x284)

<4>[ 7691.908050]  r5:c62c4c00 r4:400448ca

<4>[ 7691.912658] [<c0365fcc>] (hci_sock_ioctl+0x0/0x284) from
[<c02ac90c>] (sock_ioctl+0x1f0/0x240)

<4>[ 7691.922882]  r7:400448ca r6:00000000 r5:00000000 r4:400448ca

<4>[ 7691.930541] [<c02ac71c>] (sock_ioctl+0x0/0x240) from
[<c00ccdbc>] (vfs_ioctl+0x34/0x94)

<4>[ 7691.940124]  r7:400448ca r6:00000000 r5:c02ac71c r4:c97d04c0

<4>[ 7691.947784] [<c00ccd88>] (vfs_ioctl+0x0/0x94) from [<c00cd41c>]
(do_vfs_ioctl+0x508/0x56c)

<4>[ 7691.957916]  r7:0000006b r6:400448ca r5:c97d04c0 r4:00000000

<4>[ 7691.965332] [<c00ccf14>] (do_vfs_ioctl+0x0/0x56c) from
[<c00cd4c0>] (sys_ioctl+0x40/0x64)

<4>[ 7691.975372]  r9:c9da2000 r8:c0037004 r7:0000006b r6:400448ca r5:00000000

<4>[ 7691.983947] r4:c97d04c0

<4>[ 7691.987548] [<c00cd480>] (sys_ioctl+0x0/0x64) from [<c0036e80>]
(ret_fast_syscall+0x0/0x2c)

<4>[ 7691.997497]  r7:00000036 r6:0000006b r5:aea00d30 r4:aea010c8

<4>[ 7692.005157] Code: e5955000 e3550000 0a000006 e1a04005 (e4143008)

<0>[ 7692.014282] Kernel panic - not syncing: Fatal exception


<< the code I am currently using, one line insertion on official 2.6.30 >>

1670static inline void hci_sync_conn_complete_evt(struct hci_dev
*hdev, struct sk_buff *skb)
1671{
1672        struct hci_ev_sync_conn_complete *ev = (void *) skb->data;
1673        struct hci_conn *conn;
1674
1675        BT_DBG("%s status %d", hdev->name, ev->status);
1676
1677        hci_dev_lock(hdev);
1678
1679        conn = hci_conn_hash_lookup_ba(hdev, ev->link_type, &ev->bdaddr);
1680        if (!conn) {
1681                if (ev->link_type == ESCO_LINK)
1682                        goto unlock;
1683
1684                conn = hci_conn_hash_lookup_ba(hdev, ESCO_LINK,
&ev->bdaddr);
1685                if (!conn)
1686                        goto unlock;
1687
1688                conn->type = SCO_LINK;
1689        }
1690
1691        switch (ev->status) {
1692        case 0x00:
1693                conn->handle = __le16_to_cpu(ev->handle);
1694                conn->state  = BT_CONNECTED;
1695
1696                hci_conn_add_sysfs(conn);
1697                break;
1698
++++        case 0x10:      /* Connection accept timeout exceeded */
1699        case 0x1c:      /* SCO interval rejected */
1700        case 0x1f:      /* Unspecified error */
1701                if (conn->out && conn->attempt < 2) {
1702                        conn->pkt_type = (hdev->esco_type & SCO_ESCO_MASK) |
1703                                        (hdev->esco_type & EDR_ESCO_MASK);
1704                        hci_setup_sync(conn, conn->link->handle);
1705                        goto unlock;
1706                }
1707                /* fall through */
1708
1709        default:
1710                conn->state = BT_CLOSED;
1711                break;
1712        }
1713
1714        hci_proto_connect_cfm(conn, ev->status);
1715        if (ev->status)
1716                hci_conn_del(conn);
1717
1718unlock:
1719        hci_dev_unlock(hdev);
1720}


2010/1/13 Marcel Holtmann <marcel@xxxxxxxxxxxx>:
> Hi Liang,
>
>> >> We resumed the work of troubleshooting some problematic carkits
>> >> recently. While we agree with that eSCO/SCO is handled by the stuff
>> >> below HCI and bluez is absoultely right handling this, we still need
>> >> to face the fact that there're bunch of carkits and headsets with
>> >> hidden issues which only get exposed in certain combination like the
>> >> HF850 and our phones. Bluez can still be even better.
>> >>
>> >> In our case, the HF850 claims eSCO but when you're trying to connect
>> >> with it, it will timeout. So to have the stack works with those
>> >> on-market products, it's better if the developers sitting above the
>> >> HCI layer have some tweaking method on bluez behavior in ununsal
>> >> cases.
>> >>
>> >> Regarding the particular eSCO timeout failure, we add the following
>> >> delta into the net/bluetooth/hci_event.c
>> >>
>> >>         hci_proto_connect_cfm(conn, ev->status);
>> >> +
>> >> +    if (conn->out && (ev->status == 0x1a || ev->status == 0x1c ||
>> >> +            ev->status == 0x1f || ev->status == 0x10) &&
>> >> +            conn->attempt < 2) {
>> >> +        conn->pkt_type = (hdev->esco_type & SCO_ESCO_MASK) |
>> >> +                    (hdev->esco_type & EDR_ESCO_MASK);
>> >> +        hci_setup_sync(conn, conn->link->handle);
>> >> +        goto unlock;
>> >> +    }
>> >> +
>> >>       if (ev->status)
>> >>               hci_conn_del(conn);
>> >>
>> >> With this code change, we can successfully retry a SCO link after eSCO
>> >> attempt timeout. Still, a setsockopt interface will be better for
>> >> tweaking. Or is there any better idea on how to allow bluez work with
>> >> those on-market problematic devices?
>> >
>> > no you can NOT. It is race condition hazard. You can not call
>> > hci_proto_connect_cfm() and then just hci_setup_sync(). That confuses
>> > upper layers and just works by pure luck.
>> >
>> > The upstream code actually handles already the 0x1c and 0x1f error codes
>> > and retries the setup without eSCO packets types. So you snippet is
>> > against an old kernel (older than 2.6.30 at least).
>> >
>> Yeah, you're right, we're using version 2.6.29. I put the retry code between
>>
>>     hci_proto_connect_cfm(conn, ev->status)
>>
>> and
>>
>>     if (ev->status)
>>          hci_conn_del(conn);
>>
>> because the kernel log is showing a warning complaining duplicate
>> device files is being added( log put at the end to make reading easy).
>> It seems the device file is created twice. I noticed a call to
>> hci_conn_hold_device(conn) is added to kernel 2.6.30 and onwards. I'll
>> try the new way tomorrow.
>
> that should be fixed. Please re-test with bluetooth-testing tree based
> on 2.6.33-rc3.
>
>> >> The hcidump log is copied again here for your convenience:
>> >> > 2009-09-03 17:26:29.093601 < HCI Command: Setup Synchronous Connection (0x01|0x0
>> >> > 028) plen 17
>> >> >    handle 1 voice setting 0x0060
>> >> > 2009-09-03 17:26:29.094608 > HCI Event: Command Status (0x0f) plen 4
>> >> >    Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1
>> >> > 2009-09-03 17:26:34.190952 > HCI Event: Synchronous Connect Complete (0x2c) plen
>> >> >  17
>> >> >    status 0x10 handle 257 bdaddr 00:50:CD:20:BA:E6 type eSCO
>> >> >    Error: Connection Accept Timeout Exceeded
>> >
>> > The accept timeout has a pretty clear definition and while we could
>> > retry, I am having some concerns here. This also does not explain why
>> > 0x1a is in your code snippet.
>> We conduct a retry for the code 0x1a because there's also some devices
>> claim eSCO capability but respond with "Unsupported Remote Feature /
>> Unsupported LMP Feature (0X1A)".
>
> So adding 0x1a to the exception list and falling back to SCO is fine
> with me. Send a proper patch against the bluetooth-testing tree with a
> detailed commit message and the extract of hcidump -X -V for this one.
> You will find an example on how such a commit message should look like
> when looking through the history.
>
>> We can understand your concern for 0x10 as it's a timeout by
>> definition, therefore, if a retry will be done, probably the same set
>> of parameters shall be used. However, the reality is that devices came
>> to market around eSCO was introduced could be troubles - eSCO was
>> added but not fully tested or failed to pass test and finally device
>> manufacturer had to turn off them but inconsistency was kept - like
>> what the Motorola HF850 does. Anyway, it's your call to determine the
>> behavior for the 0x10(Connection Accept Timeout Exceeded) and
>> personally I'll respect that.
>
> If we really wanna work around this problem, then it might be better to
> extend the SCO socket parameters to allow specifying a eSCO/SCO packet
> mask. So it is up to the userspace to select any quirks.
>
> Regards
>
> Marcel
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux