Re: [PATCH 3/4] Bluetooth: Limit depth of the HCI TX queue with ERTM mode

Mat Martineau <mathewm@xxxxxxxxxxxxxx> · Thu, 9 Jun 2011 16:36:29 -0700 (PDT)

Gustavo,

On Wed, 8 Jun 2011, Gustavo F. Padovan wrote:

Hi Mat,

* Mat Martineau <mathewm@xxxxxxxxxxxxxx> [2011-06-03 16:21:09 -0700]:

In order to provide timely responses to REJ, SREJ, and poll input from
the remote device, it helps to reduce the number of ERTM data frames
in the HCI TX queue at one time. If a full TX window of data is in the
HCI TX queue, any responses to REJ, SREJ, or polls must wait in line
behind all previously queued data. This latency leads to disconnects,
and will be more severe with extended window sizes.

I prefer if we go with a hci_send_acl_prio() implementation. It will have much
less overhead using a workqueue. As it will be filled only by S-frames with a
few bytes each I don't think we will have problems. So lets go with this
approach and see what we can get.

I considered that approach too, but it breaks some major assumptions 
and I don't think it complies with the ERTM spec.  I-frames contain 
reqseq fields and a final bit, so if S-frames and I-frames are 
delivered out-of-sequence, you can easily end up with a confusing 
series of reqseq values at the receiver.

Suppose the HCI tx queue is full of I-frames, and the oldest I-frame 
has reqseq set to 1.  Since that I-frame has been queued, other 
incoming I-frames have been processed, so the last recieved I-frame 
had txseq 20.  The remote device sends a poll, and we reply with an RR 
(reqseq 21) using the priority queue.  HCI sends that RR first, then 
an I-frame from the normal queue with reqseq 1.  Now the remote side 
thinks it missed all of the frames from 21 to 1 (having wrapped 
around).  The remote side then has to send REJ or SREJ frames, even 
though nothing is actually missing.

So, I think we have two options:

 * Use the skb_destructor mechanism to pull data for ERTM (which is 
what my patch does), and leave queuing for other modes alone
 * Rearchitect HCI & L2CAP so that data is pulled from the L2CAP layer 
as num_comp_pkts events are received

I realize there is increased overhead to make the callbacks to get 
data out of the ERTM tx queue, but the skb destructor is very 
lightweight (since it uses an atomic_t counter).  The overhead is 
tunable using L2CAP_MAX_ERTM_QUEUED and L2CAP_MIN_ERTM_QUEUED to 
control how often the callback to l2cap_ertm_send() is actually made. 
With the current queuing behavior, things get unmanageable on AMP with 
extra latency from larger tx windows and much shorter timeouts.

Regards,

--
Mat Martineau
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html