Re: RFCOMM disconnection problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9 August 2012 15:46, Andrei Emeltchenko
<andrei.emeltchenko.news@xxxxxxxxx> wrote:
> Hi Dean,
>
> On Thu, Aug 09, 2012 at 03:29:18PM +0100, Dean Jenkins wrote:
>> One obvious failure of the rfcomm session refcnt is that the refcnt
>> counter either starts with a value of 0 or 1 depending on which peer
>> initiated the connection request, that is wrong. The initiator
>> direction is not relevant for the session as connect and disconnect
>> are independent events. The refcnt should start with a value of 1 in
>> all cases.
>>
>> I am using a 2-core ARM environment that is under high processor
>> loading. The rfcomm session refcnt caused kernel crashes. I used a
>> 2.6.34 kernel but the latest 3.5-RC1 still has the poor rfcomm code.
>> My solution was to remove the rfcomm session refcnt and to ensure that
>> the freeing of the rfcomm session pointer was propagated through-out
>> the rfcomm core code. Some kernel crashes were due to reuse of the
>> freed rfcomm session pointer.
>
> Maybe it does make sense to fix refcounting instead of removing?
>
Hi Andrei,

The existing rfcomm session state machine is capable of determining
when to delete the rfcomm session structure. IMHO the refcnt is an
unnecessary complication. I have fixed rfcomm for our project by
removing the refcnt and ensuring no code can reuse a freed rfcomm
session pointer. This code I have already sent to the mailing list.

In order to get rfcomm to fail, high processor load is necessary
during disconnections to ensure that some of the loops in rfcomm
process more than 1 thing. It is because there are multiple copies of
the rfcomm session pointer s that failure occurs eg. double rfcomm
session free.

Here is an example in net/bluetooth/rfcomm/core.c:

static inline void rfcomm_process_rx(struct rfcomm_session *s)
{
	struct socket *sock = s->sock;
	struct sock *sk = sock->sk;
	struct sk_buff *skb;

	BT_DBG("session %p state %ld qlen %d", s, s->state,
skb_queue_len(&sk->sk_receive_queue));

	/* Get data directly from socket receive queue without copying it. */
	while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
		skb_orphan(skb);
		if (!skb_linearize(skb))
			rfcomm_recv_frame(s, skb);
		else
			kfree_skb(skb);
	}

	if (sk->sk_state == BT_CLOSED) {
		if (!s->initiator)
			rfcomm_session_put(s);

		rfcomm_session_close(s, sk->sk_err);
	}
}

In rfcomm_process_rx(), you can see there is a while loop that causes
frames to be analysed and actioned in rfcomm_recv_frame(). This action
may cause the rfcomm session pointer to be freed because the rfcomm
session refcnt has reached zero. Therefore, the s pointer is now
invalid in the scope of rfcomm_process_rx() because the underlying
session structure was freed. Unfortunately,  if the socket state is
also detected as BT_CLOSED then !s->initiator may cause a crash or at
least accesses wrong data as the s pointer is invalid (pointing to
freed memory or reallocated memory). The refcnt in
rfcomm_session_put(s) is also invalid as s is invalid. In fact looking
at the initiator value is wrong for the rfcomm session. I suspect this
was a workaround that did not fix the root cause.

In practice, this failure scenario is difficult to reproduce (we hit
it in our embedded environment). It requires the socket to be in the
BT_CLOSED at the same time as the rfcomm session has been freed in a
single run of rfcomm_process_rx(). This is why high processor loading
triggers the malfunction because rfcomm_process_rx() maybe pre-empted
or time-sliced so rfcomm_process_rx() takes longer to complete than
normal. This gives the opportunity for the socket state to change to
BT_CLOSED during the run-time of rfcomm_process_rx().

In other words, the refcnt going to zero does not prevent copies of
the rfcomm session pointer s that now points to freed memory (or
reallocated memory) from being erroneously reused.

You can also see

		if (!s->initiator)
			rfcomm_session_put(s);

		rfcomm_session_close(s, sk->sk_err);

If rfcomm_session_put(s) succeeds in causing the refcnt to go to zero,
the rfcomm session structure will be freed. s is now invalid.
Unfortunately, rfcomm_session_close(s, sk->sk_err) reuses the s
pointer with a potential for a crash.

The refcnt could be fixed but fixing it will show it to be unnecessary
code IMHO. One of the weaknesses in the code is not managing the s
pointer after the session has been freed.

I do intend to release a patchset of all our rfcomm changes.

I welcome any comments.

Regards,
Dean

> Best regards
> Andrei Emeltchenko
>
>>
>> I intend to release a patchset of rfcomm fixes.
>>
>> Therefore, IMHO the fix "Bluetooth: Fix RFCOMM session reference
>> counting issue" (commit cf33e7 in linux-stable.git) from Octavian
>> Purdila in kernel 3.4.6 is in fact not fixing the root cause and
>> introduces a misbehaviour of the refcnt. In our project, we rejected
>> this commit because disconnections failed.
>>
>


-- 
Dean Jenkins
Embedded Software Engineer
Professional Services UK/EMEA
MontaVista Software, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux