Re: kernel panic happens when disconnecting Bluetooth headset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, Sep 17, 2009 at 5:17 AM, Marcel Holtmann <marcel@xxxxxxxxxxxx> wrote:
> Hi Nick,
>
>> >>> >> We met a issue that kernel panic happens when disconnecting some kinds
>> >>> >> of Bluetooth headset, then we did some analysis and made some changes
>> >>> >> on kernel code which have avoided the panic happening. Would you
>> >>> >> please help to check if our analysis and fix is correct?
>> >>> >>
>> >>> >> =============
>> >>> >> Issue description
>> >>> >> =============
>> >>> >> On Android platform(kernel 2.6.29), disconnecting Bluetooth headset
>> >>> >> may cause kernel panic on certain conditions.
>> >>> >>
>> >>> >> (Pre-condition is android paired with headset.)
>> >>> >> Initiate the connection from android, disconnect it from android, result is OK.
>> >>> >> Initiate the connection from android, disconnect it from headset, result is OK.
>> >>> >> Initiate the connection from headset, disconnect it from headset, result is OK.
>> >>> >> Initiate the connection from headset, disconnect it from android, for
>> >>> >> Motorola H12 headset, result is OK.
>> >>> >> Initiate the connection from headset, disconnect it from android, for
>> >>> >> Motorola H620/560 headset, result is kernel panic.
>> >>> >>
>> >>> >> =============
>> >>> >> Kernel panic point
>> >>> >> =============
>> >>> >> kernel panic at __list_del() in the function rfcomm_session_del() ,
>> >>> >> panic reason is "Unable to handle kernel paging request at virtual
>> >>> >> address 00200200"
>> >>> >>
>> >>> >> =============
>> >>> >> Kernel log analysis
>> >>> >> =============
>> >>> >> rfcomm_session_del() is still called after the session entry is
>> >>> >> removed from the list. Then __list_del() will cause kernel panic
>> >>> >> because of the incorrect pointer. This situation occurs when calling
>> >>> >> rfcomm_recv_ua() when the socket state is BT_CLOSED . So we need to
>> >>> >> find out why the socket state become BT_CLOSED before we calling
>> >>> >> rfcomm_recv_ua().
>> >>> >>
>> >>> >> # [  171.677429] rfcomm_sock_sendmsg: sock ce9a0960, sk cc5c4c00
>> >>> >> [  171.683532] rfcomm_dlc_send: dlc cd3fe920 mtu 255 len 14
>> >>> >> [  171.689422] rfcomm_process_rx: session cc751be0 state 1 qlen 0
>> >>> >> [  171.695709] rfcomm_process_rx: @@@ @@@ sk_state = 1
>> >>> >> [  171.701110] rfcomm_process_dlcs: session cc751be0 state 1
>> >>> >> [  171.706939] rfcomm_process_tx: dlc cd3fe920 state 1 cfc 40
>> >>> >> rx_credits 33 tx_credits 31
>> >>> >> [  171.715515] rfcomm_send_frame: session cc751be0 len 18
>> >>> >> [  171.721130] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 3
>> >>> >> [  174.127807] rfcomm_sock_sendmsg: sock ce9a0960, sk cc5c4c00
>> >>> >> [  174.134490] rfcomm_dlc_send: dlc cd3fe920 mtu 255 len 6
>> >>> >> [  174.141540] rfcomm_process_rx: session cc751be0 state 1 qlen 0
>> >>> >> [  174.148498] rfcomm_process_rx: @@@ @@@ sk_state = 1
>> >>> >> [  174.154968] rfcomm_process_dlcs: session cc751be0 state 1
>> >>> >> [  174.161437] rfcomm_process_tx: dlc cd3fe920 state 1 cfc 40
>> >>> >> rx_credits 33 tx_credits 30
>> >>> >> [  174.171173] rfcomm_send_frame: session cc751be0 len 10
>> >>> >> [  174.177642] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 3
>> >>> >> [  174.205932] rfcomm_sock_release: sock ce9a0960, sk cc5c4c00
>> >>> >> [  174.212707] rfcomm_sock_shutdown: sock ce9a0960, sk cc5c4c00
>> >>> >> [  174.220031] __rfcomm_sock_close: sk cc5c4c00 state 1 socket ce9a0960
>> >>> >> [  174.227508] __rfcomm_dlc_close: dlc cd3fe920 state 1 dlci 20 err 0
>> >>> >> session cc751be0
>> >>> >> [  174.236877] rfcomm_send_disc: cc751be0 dlci 20
>> >>> >> [  174.242706] rfcomm_send_frame: session cc751be0 len 4
>> >>> >> [  174.248962] rfcomm_dlc_set_timer: dlc cd3fe920 state 8 timeout 2560
>> >>> >> [  174.256835] rfcomm_sock_kill: sk cc5c4c00 state 1 refcnt 2
>> >>> >> [  174.263336] rfcomm_sock_destruct: sk cc5c4c00 dlc cd3fe920
>> >>> >> [  174.399444] rfcomm_l2data_ready: ccf70400 bytes 4
>> >>> >> [  174.404724] rfcomm_process_rx: session cc751be0 state 1 qlen 1
>> >>> >> [  174.411010] rfcomm_process_rx: @@@ @@@ sk_state = 1
>> >>> >> [  174.416412] rfcomm_recv_ua: session cc751be0 state 1 dlci 20
>> >>> >> [  174.422515] __rfcomm_dlc_close: dlc cd3fe920 state 9 dlci 20 err 0
>> >>> >> session cc751be0
>> >>> >> [  174.430816] rfcomm_dlc_clear_timer: dlc cd3fe920 state 9
>> >>> >> [  174.436553] rfcomm_dlc_unlink: dlc cd3fe920 refcnt 1 session cc751be0
>> >>> >> [  174.443572] rfcomm_dlc_free: cd3fe920
>> >>> >> [  174.447570] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 3
>> >>> >> [  174.454528] rfcomm_send_disc: cc751be0 dlci 0
>> >>> >> [  174.459259] rfcomm_send_frame: session cc751be0 len 4
>> >>> >> [  174.464904] rfcomm_process_dlcs: session cc751be0 state 8
>> >>> >> [  174.470703] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 2
>> >>> >> [  174.898284] rfcomm_l2data_ready: ccf70400 bytes 4
>> >>> >> [  174.903442] rfcomm_l2state_change: ccf70400 state 9
>> >>> >> [  174.908874] rfcomm_process_rx: session cc751be0 state 8 qlen 1
>> >>> >> [  174.915130] rfcomm_process_rx: @@@ @@@ sk_state = 9
>> >>> >> [  174.920562] rfcomm_recv_ua: session cc751be0 state 8 dlci 0
>> >>> >> [  174.926574] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 2
>> >>> >> [  174.933532] rfcomm_process_rx: @@@ @@@ sk_state == BT_CLOSED , s->initiator=0
>> >>> >> [  174.941253] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 1
>> >>> >> [  174.948211] rfcomm_session_del: session cc751be0 state 8
>> >>> >> [  174.953918] @@@@ in rfcomm_session_del()
>> >>> >> [  174.958312] @@@@ s->list = cc751be0
>> >>> >> [  174.962097] @@@@ s->list.next = ccbfe9a0
>> >>> >> [  174.966369] @@@@ s->list.prev = c047d524
>> >>> >> [  174.970733] @@@@ list is valid, call list_del()
>> >>> >> [  174.975646] @@@@ after list_del()
>> >>> >> [  174.979278] @@@@ s->list = cc751be0
>> >>> >> [  174.983184] @@@@ s->list.next = 00100100
>> >>> >> [  174.987457] @@@@ s->list.prev = 00200200
>> >>> >> [  174.991729] rfcomm_session_close: session cc751be0 state 8 err 104
>> >>> >> [  174.998504] rfcomm_session_put: in rfcomm_session_put, s->refcnt = 1
>> >>> >> [  175.005310] rfcomm_session_del: session cc751be0 state 9
>> >>> >> [  175.011169] @@@@ in rfcomm_session_del()
>> >>> >> [  175.015441] @@@@ s->list = cc751be0
>> >>> >> [  175.019409] @@@@ s->list.next = 00100100
>> >>> >> [  175.023651] @@@@ s->list.prev = 00200200
>> >>> >> [  175.027923] @@@@ list is valid, call list_del()
>> >>> >> [  175.032958] Unable to handle kernel paging request at virtual
>> >>> >> address 00200200
>> >>> >> [  175.040679] pgd = c0004000
>> >>> >> [  175.043792] [00200200] *pgd=00000000
>> >>> >> [  175.047821] Internal error: Oops: 817 [#1]
>> >>> >> [  175.052246] Modules linked in:
>> >>> >> [  175.055725] CPU: 0    Not tainted  (2.6.29-omap1-dirty #34)
>> >>> >> [  175.061859] PC is at rfcomm_session_del+0x6c/0x108
>> >>> >> [  175.067047] LR is at release_console_sem+0x190/0x1a0
>> >>> >> [  175.072509] pc : [<c033ded8>]    lr : [<c0066308>]    psr: 60000013
>> >>> >> [  175.072509] sp : cc1abf38  ip : cc1abe68  fp : cc1abf4c
>> >>> >> [  175.084960] r10: cc751c04  r9 : c036d2fc  r8 : cc751be0
>> >>> >> [  175.090545] r7 : 00000068  r6 : cc751c04  r5 : 00000009  r4 : cc751be0
>> >>> >> [  175.097656] r3 : 00100100  r2 : 00100100  r1 : 00200200  r0 : c0422876
>> >>> >>
>> >>> >> =============
>> >>> >> HCI log analysis
>> >>> >> =============
>> >>> >> Compare the hcidump log of the correct case with the one of the panic
>> >>> >> case, we found there is only one difference in the message sequence.
>> >>> >> In the panic case, headset send L2CAP Disconn_Req immediately after
>> >>> >> sending rfcomm UA frame to android. We think this is the reason that
>> >>> >> cause the socket state become BT_CLOSED.
>> >>> >>
>> >>> >> Please compare these two log, pay attention to the message direction
>> >>> >> of the last Disconn_Req.
>> >>> >>
>> >>> >>
>> >>> >> Log of correct case:
>> >>> >> ----------------------------
>> >>> >>
>> >>> >>
>> >>> >> 009-09-10 15:27:28.963519 < ACL data: handle 1 flags 0x02 dlen 22
>> >>> >>    L2CAP(d): cid 0x0047 len 18 [psm 3]
>> >>> >>      RFCOMM(d): UIH: cr 0 dlci 20 pf 0 ilen 14 fcs 0xeb
>> >>> >>      0000: 0d 0a 2b 43 49 45 56 3a  20 37 2c 33 0d 0a        ..+CIEV: 7,3..
>> >>> >> 009-09-10 15:27:28.967272 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >>
>> >>> >>    handle 1 packets 1
>> >>> >> 009-09-10 15:27:29.243945 < ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>    L2CAP(d): cid 0x0047 len 4 [psm 3]
>> >>> >>      RFCOMM(s): DISC: cr 0 dlci 20 pf 1 ilen 0 fcs 0x7d
>> >>> >> 009-09-10 15:27:29.247363 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >>
>> >>> >>    handle 1 packets 1
>> >>> >> 009-09-10 15:27:29.274890 > ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>    L2CAP(d): cid 0x0040 len 4 [psm 3]
>> >>> >>      RFCOMM(s): UA: cr 0 dlci 20 pf 1 ilen 0 fcs 0x57
>> >>> >> 009-09-10 15:27:29.296343 < ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>    L2CAP(d): cid 0x0047 len 4 [psm 3]
>> >>> >>      RFCOMM(s): DISC: cr 0 dlci 0 pf 1 ilen 0 fcs 0x9c
>> >>> >> 009-09-10 15:27:29.298480 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >>
>> >>> >>    handle 1 packets 1
>> >>> >> 009-09-10 15:27:29.319873 > ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>    L2CAP(d): cid 0x0040 len 4 [psm 3]
>> >>> >>      RFCOMM(s): UA: cr 0 dlci 0 pf 1 ilen 0 fcs 0xb6
>> >>> >> 009-09-10 15:27:29.320727 < ACL data: handle 1 flags 0x02 dlen 12
>> >>> >>    L2CAP(s): Disconn req: dcid 0x0047 scid 0x0040
>> >>> >> 009-09-10 15:27:29.323474 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >>
>> >>> >>    handle 1 packets 1
>> >>> >> 009-09-10 15:27:29.337237 > ACL data: handle 1 flags 0x02 dlen 12
>> >>> >>    L2CAP(s): Disconn rsp: dcid 0x0047 scid 0x0040
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> log of panic case:
>> >>> >> ------------------------
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> 2009-09-10 13:34:24.020208 < ACL data: handle 1 flags 0x02 dlen 22
>> >>> >>     L2CAP(d): cid 0x0041 len 18 [psm 3]
>> >>> >>       RFCOMM(d): UIH: cr 0 dlci 20 pf 0 ilen 14 fcs 0xeb
>> >>> >>       0000: 0d 0a 2b 43 49 45 56 3a  20 37 2c 33 0d 0a        ..+CIEV: 7,3..
>> >>> >> 2009-09-10 13:34:24.281256 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >> 5
>> >>> >>     handle 1 packets 1
>> >>> >> 2009-09-10 13:34:24.083580 < ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>     L2CAP(d): cid 0x0041 len 4 [psm 3]
>> >>> >>       RFCOMM(s): DISC: cr 0 dlci 20 pf 1 ilen 0 fcs 0x7d
>> >>> >> 2009-09-10 13:34:24.529442 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >> 5
>> >>> >>     handle 1 packets 1
>> >>> >> 2009-09-10 13:34:24.531914 > ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>     L2CAP(d): cid 0x0040 len 4 [psm 3]
>> >>> >>       RFCOMM(s): UA: cr 0 dlci 20 pf 1 ilen 0 fcs 0x57
>> >>> >> 2009-09-10 13:34:24.533135 < ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>     L2CAP(d): cid 0x0041 len 4 [psm 3]
>> >>> >>       RFCOMM(s): DISC: cr 0 dlci 0 pf 1 ilen 0 fcs 0x9c
>> >>> >> 2009-09-10 13:34:25.028649 > HCI Event: Number of Completed Packets (0x13) plen
>> >>> >> 5
>> >>> >>     handle 1 packets 1
>> >>> >> 2009-09-10 13:34:25.032128 > ACL data: handle 1 flags 0x02 dlen 8
>> >>> >>     L2CAP(d): cid 0x0040 len 4 [psm 3]
>> >>> >>       RFCOMM(s): UA: cr 0 dlci 0 pf 1 ilen 0 fcs 0xb6
>> >>> >> 2009-09-10 13:34:25.032341 > ACL data: handle 1 flags 0x02 dlen 12
>> >>> >>     L2CAP(s): Disconn req: dcid 0x0040 scid 0x0041
>> >>> >> 2009-09-10 13:34:25.032646 < ACL data: handle 1 flags 0x02 dlen 12
>> >>> >>     L2CAP(s): Disconn rsp: dcid 0x0040 scid 0x0041
>> >>> >>
>> >>> >> =============
>> >>> >> Analysis Result
>> >>> >> =============
>> >>> >> For some kinds of Bluetooth headset such as Motorola H560/H620 which
>> >>> >> are based on BCM2044S, they will send L2CAP Disconn_Req command right
>> >>> >> after sending rfcomm UA frame. This L2CAP Disconn_Req will cause the
>> >>> >> rfcomm socket state become BT_CLOSED before completely handling UA
>> >>> >> frame, thus it will cause kernel panic. I think we can ignore the
>> >>> >> received rfcomm frames if socket state is BT_CLOSED, because it
>> >>> >> doesn't make sense in the BT_CLOSED state.
>> >>> >>
>> >>> >>
>> >>> >> =============
>> >>> >> Changed Code
>> >>> >> =============
>> >>> >> We changed the code in the function rfcomm_process_rx() in
>> >>> >> net/bluetooth/rfcomm/core.c, check the socket state first before
>> >>> >> handling the received framew. If the socket state is BT_CLOSED, we
>> >>> >> don't handle any rfcomm frames but just close the session.
>> >>> >>
>> >>> >> The change is like below
>> >>> >>
>> >>> >> +       if (sk->sk_state != BT_CLOSED) {
>> >>> >>         /* Get data directly from socket receive queue without copying it. */
>> >>> >>         while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
>> >>> >>                 skb_orphan(skb);
>> >>> >>                 rfcomm_recv_frame(s, skb);
>> >>> >>         }
>> >>> >> -
>> >>> >> -       if (sk->sk_state == BT_CLOSED) {
>> >>> >> +       } else {
>> >>> >>                 if (!s->initiator)
>> >>> >>                         rfcomm_session_put(s);
>> >>> >>
>> >>> >>                 rfcomm_session_close(s, sk->sk_err);
>> >>> >>          }
>> >>> >
>> >>> > so I do see the issue here, but I don't agree with the fix since it
>> >>> > changes behavior that might cause other issues. So in case the frame
>> >>> > processing leads to sk->sk_state == BT_CLOSED we are not closing the
>> >>> > connection anymore if we make it depend on a state before the frame
>> >>> > processing. And nothing guarantees that rfcomm_process_rx gets scheduled
>> >>> > again.
>> >>> >
>> >>> > diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
>> >>> > index 94b3388..606143b 100644
>> >>> > --- a/net/bluetooth/rfcomm/core.c
>> >>> > +++ b/net/bluetooth/rfcomm/core.c
>> >>> > @@ -1798,12 +1798,16 @@ static inline void rfcomm_process_rx(struct rfcomm_session *s)
>> >>> >
>> >>> >        BT_DBG("session %p state %ld qlen %d", s, s->state, skb_queue_len(&sk->sk_receive_queue));
>> >>> >
>> >>> > +       rfcomm_session_hold(s);
>> >>> > +
>> >>> >        /* Get data directly from socket receive queue without copying it. */
>> >>> >        while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
>> >>> >                skb_orphan(skb);
>> >>> >                rfcomm_recv_frame(s, skb);
>> >>> >        }
>> >>> >
>> >>> > +       rfcomm_session_put(s);
>> >>> > +
>> >>> >        if (sk->sk_state == BT_CLOSED) {
>> >>> >                if (!s->initiator)
>> >>> >                        rfcomm_session_put(s);
>> >>> >
>> >>> > What does the above patch do for you? Since if I read it correctly, then
>> >>> > the rfcomm_recv_ua causes the rfcomm_session_put to trigger the closing
>> >>> > of the session. And then in this case it is delayed until after all
>> >>> > frames are processed.
>> >>> >
>> >>> >
>> >>>
>> >>> I've tried your patch but unfortunately kernel panic still happened.
>> >>>
>> >>> From the log I noticed that if rfcomm_l2state_change is called before
>> >>> rfcomm_process_rx, kernel panic will happen definitely.
>> >>>
>> >>> Below lines are in the correct log,
>> >>>
>> >>> [  139.323852] rfcomm_l2data_ready: ccf70000 bytes 4
>> >>> [  139.346252] rfcomm_process_rx: session ccb94ce0 state 8 qlen 1
>> >>> ...
>> >>> [  139.457519] rfcomm_l2state_change: ccf70000 state 9
>> >>> (disconnect ok)
>> >>>
>> >>> In the above case, when process_rx, the code in the condition "if
>> >>> (sk->sk_state == BT_CLOSED)" will never run.
>> >>>
>> >>> Below lines are in the panic log,
>> >>>
>> >>> [  174.898284] rfcomm_l2data_ready: ccf70400 bytes 4
>> >>> [  174.903442] rfcomm_l2state_change: ccf70400 state 9
>> >>> [  174.908874] rfcomm_process_rx: session cc751be0 state 8 qlen 1
>> >>> ...
>> >>> ( then panic)
>> >>>
>> >>> In the above case, sk_state is changed to 9 (BT_CLOSED) firstly,  then
>> >>> process_rx, so the code in the condition "if (sk->sk_state ==
>> >>> BT_CLOSED) " will be run, it will call session_put twice. I think this
>> >>> is the root cause of panic.
>> >>
>> >> I know why it happens, that is not the problem. My point is not to break
>> >> current scheduling assumptions.
>> >>
>> >> So if you move the rfcomm_session_put() now at the end of the function,
>> >> then it should be fine, right?
>> >>
>> >> Regards
>> >>
>> >> Marcel
>> >>
>> >>
>> >>
>> >
>> > You are right. I moved the rfcomm_session_put() at the end of
>> > rfcomm_process_tx() then kernel panic doesn't happen any longer.
>> >
>> > The changed code is like below,
>> >
>> > @@ -1796,6 +1796,8 @@ static inline void rfcomm_process_rx(struct rfcomm_session
>> >
>> >        BT_DBG("session %p state %ld qlen %d", s, s->state, skb_queue_len(&sk->s
>> >
>> > +       rfcomm_session_hold(s);
>> > +
>> >        /* Get data directly from socket receive queue without copying it. */
>> >        while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
>> >                skb_orphan(skb);
>> > @@ -1808,6 +1810,8 @@ static inline void rfcomm_process_rx(struct rfcomm_session
>> >
>> >                rfcomm_session_close(s, sk->sk_err);
>> >        }
>> > +
>> > +       rfcomm_session_put(s);
>> >  }
>> >
>> >  static inline void rfcomm_accept_connection(struct rfcomm_session *s)
>> >
>> > Please submit this change to bluez release.
>> >
>> > Thank you,
>> > Zhu Lan
>>
>> http://android.git.kernel.org/?p=kernel/common.git;a=commit;h=6f505dbe5337e49302574f8d2e65fd83e30f9117
>>
>> If Bluez wants to cherry-pick it.
>
> if it would have a proper commit message and author details, I would,
> but that is not the case. The whole patch + commit message is missing
> the background on why we have to do it.

Guys how are we going to proceed with it? I think this patch is
important as we also see couple of crashes recently. Is commit to
android missing some information?

Regards,
Andrei
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux