Kernel panic in rfcomm_run - unbalanced refcount on rfcomm_session

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Since 2.6.32 we are seeing kernel panics like:

[10651.110229] Unable to handle kernel paging request at virtual
address 6b6b6b6b
[10651.111968] Internal error: Oops: 5 [#1] PREEMPT
[10651.113952] CPU: 0    Tainted: G        W   (2.6.32-59979-gd0c97db #1)
[10651.114624] PC is at rfcomm_run+0xa04/0xdbc
<...>
[10651.406188] [<c031ad24>] (rfcomm_run+0xa04/0xdbc) from [<c006ce30>]
(kthread+0x78/0x80)
[10651.406585] [<c006ce30>] (kthread+0x78/0x80) from [<c002793c>]
(kernel_thread_exit+0x0/0x8)

(rfcomm_run() is all inlined so theres not much of a stack trace))

This is a use-after-free on struct rfcomm_session s in the call chain
rfcomm_run() -> rfcomm_process_sessions() -> rfcomm_process_dlcs() ->
list_for_each_safe(p, n, &s->dlcs). The only way this can happen is if
there is an unbalanced refcount on the rfcomm session.

We found that reverting the patch
9e726b17422bade75fba94e625cd35fd1353e682 "Bluetooth: Fix rejected
connection not disconnecting ACL link" fixes the issue for us. The
patch itself looks ok, I added some logging to check the new refcounts
in the patch are balanced and they are. However if I remove the new
calls to rfcomm_session_put() and rfcomm_session_hold() the crash is
resolved. I also found that we can crash without hitting
rfcomm_session_timeout(), so its not related to Marcel's recent patch
to remove the scheduling-while-atomic warning.

9e726b17422bade75fba94e625cd35fd1353e682 does lead to a delay in
calling rfcomm_session_del() due to the extra refcount while waiting
for the new timeout. I believe that this delay has revealed some more
subtle problem elsewhere that causes an unbalanced refcount and then
the kernel panic.

I have debug kernel logs and hci logs - they are too large to send to
the list but I can send them directly to anyone interested in
debugging.

We see this crash frequently with a number of headsets since 2.6.32,
but not reliably. I do have a 100% repro case with the Nuvi Garmin,
with these exact steps:
1) Make sure Nuvi is unpaired, Bluez stack is unpaired, and kernel has
been rebooted since unpairing.
2) Initiate device discovery, pairing, and handsfree connection from Nuvi
3) Observe HFP rfcomm connect briefly, then disconnect, and kernel panic

Our short-term solution is unfortunately to revert
9e726b17422bade75fba94e625cd35fd1353e682.

Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux