Re: [RFC 1/3] Bluetooth: prioritizing data over HCI

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Wed, 3 Aug 2011 16:53:45 -0400

Hi Luiz,

On Wed, 2011-08-03 at 13:49 -0400, Luiz Augusto von Dentz wrote:
> Hi Peter,
> 
> On Wed, Aug 3, 2011 at 7:25 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > I believe that the existing tx scheduler is already inefficient and that
> > this approach significantly compounds that inefficiency.
> >
> > Consider the following scenarios with 4 ACL connections (keyboard,
> > mouse, phone, headset):
> >
> > For *one* scheduled ACL tx packet, the connection list will be searched
> > 35~42 times -- that's 140~168 connection list node visits to find one
> > packet! Here's the math:
> >        (7 - priority of tx packet) x 4 connections => found the packet
> >        7 pri levels x 4 connections => no more ACL tx packets
> >        7 pri levels x 4 connections => no SCO tx packets
> >        7 pri levels x 4 connections => no ESCO tx packets
> >        7 pri levels x 4 connections => no LE tx packets
> >        7 pri levels x 4 connections => recalc priority
> >
> > If the connection type doesn't match, it's not going to match at any
> > priority level.
> 
> Obviously there are a few things that we might change to not traverse
> the list of connection over and over, remember this is an RFC, besides
> it 8 priorities (0-7), but we can skip much earlier if connection
> doesn't match as you said.

I'm sorry if my comments appeared harsh and I apologize if I offended
you. I get it's an RFC :)
Personally, I like the idea of prioritizing the tx scheduler, and I
applaud the effort.

My point here (besides pointing out the massive iteration) was to
suggest that perhaps some scheduler state should be preserved such that
already-visited priority levels are not revisited more than once per tx
tasklet schedule.

>  Also I guess for SCO/ESCO/LE e doesn't make
> much sense to have many queues/priorities, it is basically ACL only,
> that simplify a lot already.

LE should still have priorities. In fact, in a private test we've been
running here, we've merged the scheduler so that LE conns that share ACL
buffers are scheduled alongside other ACL conns (because they share the
same resource -- namely, the acl_cnt. We also merged SCO/ESCO scheduling
together as well).

> 
> >> +
> >> +                       if (c->state != BT_CONNECTED && c->state != BT_CONFIG)
> >> +                               continue;
> >>
> >> -               num++;
> >> +                       num++;
> >>
> >> -               if (c->sent < min) {
> >> -                       min  = c->sent;
> >> -                       conn = c;
> >> +                       if (c->sent < min) {
> >> +                               min  = c->sent;
> >> +                               conn = c;
> >> +                               *queue = &c->data_q[i];
> >> +                       }
> >
> > Why preserve the fairness logic?
> 
> It does need to fair if there is 2 or more sockets with the same
> priority, otherwise the first connection in the list might get all the
> quote and even if we promote the starving queues it may still happen
> to top most priority since there it cannot be promoted anymore.

Right. What I meant here was this: if the tx scheduler is re-made as
priority-based, then it seems logical to discard the fairness logic and
design the priority scheme such that starvation is not possible. For
example, the pri levels could be like this:
	0 ~ 5	socket-programmable priorities
	6	special CAP_NET_ADMIN priority
	7	only possible via promotion
(Not that I'm suggesting that's the only or even best way to make
starvation not possible).

> >> +static void hci_prio_recalculate(struct hci_dev *hdev)
> >> +{
> >> +       struct hci_conn_hash *h = &hdev->conn_hash;
> >> +       int i;
> >> +
> >> +       BT_DBG("%s", hdev->name);
> >> +
> >> +       for (i = HCI_PRIO_MAX - 1; i > 0; i--) {
> >> +               struct list_head *p;
> >> +
> >> +               list_for_each(p, &h->list) {
> >> +                       struct hci_conn *c;
> >> +                       c = list_entry(p, struct hci_conn, list);
> >> +
> >> +                       if (c->sent || skb_queue_empty(&c->data_q[i - 1]))
> >> +                               continue;
> >
> > The test for unacked packets does not allow the priority to advance if
> > the unacked packets are from a previously executed hci_tx_task (ie., a
> > connection transmitted packets on an earlier scheduled tx_task but those
> > packets haven't been acked yet).
> 
> IMO it does, why should it be promoted to a higher priority if it
> still is pending from the last time, this is specifically to throttle
> lower priority in favor of higher, if throughput is important the
> application should request be able to sets its priority according, and
> it can do it at any time.
> 
> > This would allow a higher priority connection to almost monopolize
> > transmission when the controller is near max load. A higher priority
> > link will receive a tx quota of all available tx buffers, whereas a
> > lower priority link will not advance until it's previous tx packets are
> > acked. Worst-case scheduling could take a long time to schedule a lower
> > priority packet.
> 
> You probably have never run this code did you? Only priority 7 can
> really monopolize the connection, and that is on purpose, and yes
> lower priority are throttled so higher priority can get lower latency,
> what is wrong with that?

With no offense meant, simply running this code would be insufficient to
qualify it's appropriateness as a replacement scheduler. A bare-minimum
test matrix would ascertain comparative values for minimum/mean/maximum
latency and throughput for best-case/nominal/worst-case tx loads.

My main concern here is that app-level socket settings can have dramatic
effects on tx scheduling behavior, especially at near max tx loads.

What I meant about near monopolization is this:  a priority 6 link that
retires 2 packets every 55 ms would mean that priority 0 links could
only xmit every 330 ms (1/3 sec.), assuming the controller was already
saturated by the priority 6 link. Even priority 3 links are only going
to be able to xmit every 165ms under these conditions.

What if, instead, links that are being promoted because the controller
is saturated, are promoted to the same (or maybe, higher) priority level
than was last successful?

> >>  static void hci_tx_task(unsigned long arg)
> >>  {
> >>         struct hci_dev *hdev = (struct hci_dev *) arg;
> >> @@ -2253,6 +2316,8 @@ static void hci_tx_task(unsigned long arg)
> >>         while ((skb = skb_dequeue(&hdev->raw_q)))
> >>                 hci_send_frame(skb);
> >>
> >> +       hci_prio_recalculate(hdev);
> >
> > The priority recalculation should only be performed if one of the
> > hci_sched_* functions was unable to schedule all outstanding packets
> > (ie., a tx buffer count dropped to zero).
> 
> Yep, gonna fix that too.
> 
> > How does this scheduler + priority recalc behave if *no* packets could
> > be scheduled at all? For example, if acl_cnt == 0 when the tx_task is
> > scheduled?
> 
> Right now it promotes starving queues, all of them, but maybe it
> shouldn't so that we simplify the complexity a little bit.

My thinking here is that the tx tasklet shouldn't be scheduled at all if
no work can be performed. Of course, the existing code suffers from the
same defect. Really, the only reason the tx tasklet *needs* to be
scheduled when the relevant buffer counter is 0 is to catch tx timeouts.

Respectfully,
Peter Hurley
��.n��������+%������w��{.n�����{����^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�