Hi Gustavo, On Mon, May 17, 2010 at 10:06 PM, Gustavo F. Padovan <gustavo@xxxxxxxxxxx> wrote: >> >> We have a bug with race condition between l2cap tasklet and krfcomm process. >> >> >> >> When sending following sequence: >> >> >> >> ... >> >> No. Time Source Destination Protocol Info >> >> 89 1.951202 RFCOMM Rcvd DISC DLCI=20 >> >> 90 1.951324 RFCOMM Sent UA DLCI=20 >> >> 91 1.959381 HCI_EVT Number of Completed Packets >> >> 92 1.966461 RFCOMM Rcvd DISC DLCI=0 >> >> 93 1.966492 L2CAP Rcvd Disconnect Request >> >> 94 1.972595 L2CAP Sent Disconnect Response >> >> >> >> ... >> >> >> >> krfcommd kernel thread is preempted with l2cap tasklet which remove l2cap_conn >> >> (L2CAP connection handler structure). Then rfcomm thread tries to send RFCOMM >> >> UA which is reply to RFCOMM DISC and when de-referencing l2cap_conn crash >> >> happens. >> >> >> >> ... >> >> [ 694.175933] Unable to handle kernel NULL pointer dereference at >> >> virtual address 00000000 >> >> [ 694.184936] pgd = c0004000 >> >> [ 694.187683] [00000000] *pgd=00000000 >> >> [ 694.191711] Internal error: Oops: 5 [#1] PREEMPT >> >> [ 694.196350] last sysfs file: >> >> /sys/devices/platform/hci_h4p/firmware/hci_h4p/loading >> >> [ 694.260375] CPU: 0 Not tainted (2.6.32.10 #1) >> >> [ 694.265106] PC is at l2cap_sock_sendmsg+0x43c/0x73c [l2cap] >> >> [ 694.270721] LR is at 0xd7017303 >> >> ... >> >> [ 694.525085] Backtrace: >> >> [ 694.527587] [<bf266be0>] (l2cap_sock_sendmsg+0x0/0x73c [l2cap]) >> >> from [<c02f2cc8>] (sock_sendmsg+0xb8 >> >> [ 694.537292] [<c02f2c10>] (sock_sendmsg+0x0/0xd8) from [<c02f3044>] >> >> (kernel_sendmsg+0x48/0x80) >> >> ... >> >> >> >> I have a patch which fixes the issue but not sure that there is no >> >> better way. Waiting for comments. >> >> >> >> Currently I am investigating possibility to: >> >> - implement l2cap_conn reference counting >> > >> > sock_owned_by_user() gives the same effect as a ref count. See comments below. >> > >> >> Yes. For this cases this is enough. But proper refcounting is always >> better than this "effects" ;) >> >> >> - use socket backlog queue to process l2cap packets later when socket is not >> >> owned by the process. >> > >> > You actually don't need a backlog queue here. You can process the signal >> > packet, skipping the l2cap_chan_del() call; >> > >> ... >> >> + /* sk is locked in krfcomm process */ >> >> + if (sock_owned_by_user(sk)) { >> >> + BT_DBG("sk %p is owned by user", sk); >> >> + bh_unlock_sock(sk); >> >> + return 0; >> >> + } >> > >> > That's wrong. Use the sock_owned_by_user() only to avoid the >> > l2cap_chan_del() call, so you can process all the rest of the function >> > and send the Disconnect Response. >> >> I have changed the patch so that it only prevents l2cap_chan_del >> - l2cap_chan_del(sk, ECONNREFUSED); >> + /* delete l2cap channel if sk is not owned by user */ >> + if (!sock_owned_by_user(sk)) >> + l2cap_chan_del(sk, ECONNREFUSED); >> >> Then still l2cap_sock_kill(sk) removes sk but in the second patch sk is >> hold with sock_hold(sk) until rfcomm is still processing packet. > > Hmm, I missed the l2cap_sock_kill(). So I guess you can invert the if > condition and unlock the socket and return if sock is owned by user, > right? I am still thinking that my proposal is better. l2cap_sock_kill(sk) does not delete socket but unlink it from the list and defer deleting to the time we get to sock_put(sk) in the other place sk is used. Socket shall be deleted and this defer deletion is a good solution IMO. But I can also create patch with your proposed change. > Marcel, do you agree with that change? So what would be the right solution? Regards, Andrei > >> >> > The same check should be added to l2cap_connect_rsp() and >> > l2cap_disconnect_rsp(), since they call l2cap_chan_del() under bh >> > context as well. ;) >> >> Added. >> >> Regards, >> Andrei Emeltchenko > >> From 8531c8add8a378fc24b92f2c2311b01a7b1f63fc Mon Sep 17 00:00:00 2001 >> From: Andrei Emeltchenko <andrei.emeltchenko@xxxxxxxxx> >> Date: Mon, 17 May 2010 12:10:46 +0300 >> Subject: [PATCH 1/2] Bluetooth: Check sk is not owned before freeing l2cap_conn >> >> Check that socket sk is not locked in user process before removing >> l2cap connection handler. >> >> krfcommd kernel thread may be preempted with l2cap tasklet which remove >> l2cap_conn structure. If krfcommd is in process of sending of RFCOMM reply >> (like "RFCOMM UA" reply to "RFCOMM DISC") then kernel crash happens. >> >> ... >> [ 694.175933] Unable to handle kernel NULL pointer dereference at virtual address 00000000 >> [ 694.184936] pgd = c0004000 >> [ 694.187683] [00000000] *pgd=00000000 >> [ 694.191711] Internal error: Oops: 5 [#1] PREEMPT >> [ 694.196350] last sysfs file: /sys/devices/platform/hci_h4p/firmware/hci_h4p/loading >> [ 694.260375] CPU: 0 Not tainted (2.6.32.10 #1) >> [ 694.265106] PC is at l2cap_sock_sendmsg+0x43c/0x73c [l2cap] >> [ 694.270721] LR is at 0xd7017303 >> ... >> [ 694.525085] Backtrace: >> [ 694.527587] [<bf266be0>] (l2cap_sock_sendmsg+0x0/0x73c [l2cap]) from [<c02f2cc8>] (sock_sendmsg+0xb8/0xd8) >> [ 694.537292] [<c02f2c10>] (sock_sendmsg+0x0/0xd8) from [<c02f3044>] (kernel_sendmsg+0x48/0x80) >> ... >> >> Modified version after comments of Gustavo F. Padovan <gustavo@xxxxxxxxxxx> >> >> Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@xxxxxxxxx> >> --- >> net/bluetooth/l2cap.c | 14 +++++++++++--- >> 1 files changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c >> index bb00015..794f2b7 100644 >> --- a/net/bluetooth/l2cap.c >> +++ b/net/bluetooth/l2cap.c >> @@ -2927,7 +2927,9 @@ static inline int l2cap_connect_rsp(struct l2cap_conn *conn, struct l2cap_cmd_hd >> break; >> >> default: >> - l2cap_chan_del(sk, ECONNREFUSED); >> + /* delete l2cap channel if sk is not owned by user */ >> + if (!sock_owned_by_user(sk)) >> + l2cap_chan_del(sk, ECONNREFUSED); >> break; >> } >> >> @@ -3135,7 +3137,10 @@ static inline int l2cap_disconnect_req(struct l2cap_conn *conn, struct l2cap_cmd >> del_timer(&l2cap_pi(sk)->ack_timer); >> } >> >> - l2cap_chan_del(sk, ECONNRESET); >> + /* delete l2cap channel if sk is not owned by user */ >> + if (!sock_owned_by_user(sk)) >> + l2cap_chan_del(sk, ECONNRESET); >> + >> bh_unlock_sock(sk); >> >> l2cap_sock_kill(sk); >> @@ -3167,7 +3172,10 @@ static inline int l2cap_disconnect_rsp(struct l2cap_conn *conn, struct l2cap_cmd >> del_timer(&l2cap_pi(sk)->ack_timer); >> } >> >> - l2cap_chan_del(sk, 0); >> + /* delete l2cap channel if sk is not owned by user */ >> + if (!sock_owned_by_user(sk)) >> + l2cap_chan_del(sk, 0); >> + >> bh_unlock_sock(sk); >> >> l2cap_sock_kill(sk); >> -- >> 1.7.0.4 >> > >> From 99ffb4e41ff46022cbc52f3c947425791de7fa39 Mon Sep 17 00:00:00 2001 >> From: Andrei Emeltchenko <andrei.emeltchenko@xxxxxxxxx> >> Date: Mon, 17 May 2010 12:28:06 +0300 >> Subject: [PATCH 2/2] Bluetooth: Prevent sk freeing in tasklet using refcount >> >> Socket sk may be freed in tasklet while still be in use in krfcommd >> process. Use refcount to mark sk as used. >> >> Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@xxxxxxxxx> >> --- >> net/bluetooth/l2cap.c | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c >> index 794f2b7..bf762d6 100644 >> --- a/net/bluetooth/l2cap.c >> +++ b/net/bluetooth/l2cap.c >> @@ -1724,6 +1724,7 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct ms >> if (msg->msg_flags & MSG_OOB) >> return -EOPNOTSUPP; >> >> + sock_hold(sk); >> lock_sock(sk); >> >> if (sk->sk_state != BT_CONNECTED) { >> @@ -1808,6 +1809,7 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct ms >> >> done: >> release_sock(sk); >> + sock_put(sk); >> return err; >> } >> >> -- >> 1.7.0.4 >> > > > -- > Gustavo F. Padovan > http://padovan.org > -- To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html