Search Linux Wireless

Re: [RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes:

On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes:

On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes:

On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes:

On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
This switches the airtime scheduler in mac80211 to use a virtual
time-based
scheduler instead of the round-robin scheduler used before. This
has
a
couple of advantages:

- No need to sync up the round-robin scheduler in
firmware/hardware
with
  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can
schedule
both of
  them; no need to hard-block the scheduling rotation until the
head
of
the
  queue has used up its quantum.

- The check of whether a station is eligible for transmission
becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive,
as
we
need
to maintain an rbtree of TXQs sorted by virtual time. This means
that
ieee80211_register_airtime() becomes O(logN) in the number of
currently
scheduled TXQs. However, hopefully this number rarely grows too
big
(it's
only TXQs currently backlogged, not all associated stations), so
it
shouldn't be too big of an issue.

@@ -1831,18 +1830,32 @@ void
ieee80211_sta_register_airtime(struct
ieee80211_sta *pubsta, u8 tid,
 {
 	struct sta_info *sta = container_of(pubsta, struct sta_info,
sta);
 	struct ieee80211_local *local = sta->sdata->local;
+	struct ieee80211_txq *txq = sta->sta.txq[tid];
 	u8 ac = ieee80211_ac_from_tid(tid);
-	u32 airtime = 0;
+	u64 airtime = 0, weight_sum;
+
+	if (!txq)
+		return;

 	if (sta->local->airtime_flags & AIRTIME_USE_TX)
 		airtime += tx_airtime;
 	if (sta->local->airtime_flags & AIRTIME_USE_RX)
 		airtime += rx_airtime;

+	/* Weights scale so the unit weight is 256 */
+	airtime <<= 8;
+
 	spin_lock_bh(&local->active_txq_lock[ac]);
+
 	sta->airtime[ac].tx_airtime += tx_airtime;
 	sta->airtime[ac].rx_airtime += rx_airtime;
-	sta->airtime[ac].deficit -= airtime;
+
+	weight_sum = local->airtime_weight_sum[ac] ?:
sta->airtime_weight;
+
+	local->airtime_v_t[ac] += airtime / weight_sum;
Hi Toke,

I was porting this version of ATF design to my ath10k platform and found my old kernel version not supporting 64bit division. I'm wondering if it is necessary to use u64 for airtime and weight_sum here though I can find a solution for it. I think u32 might be enough. For airtime, u32_max / 256 = 7182219 us(718 ms). As for weight_sum, u32_max / 8092 us = 130490, meaning we can support more than 130000 nodes with airtime weight 8092 us.

Another finding was when I configured two 11ac STAs with different airtime weight, such as 256 and 1024 meaning ratio is 1:4, the throughput ratio was not roughly matching the ratio. Could you please share your results? I am not sure if it is due to platform difference.

Hi Toke,

Please ignore the previous two broken emails regarding this new
proposal
from me.

It looks like local->airtime_v_t acts like a Tx criteria. Only the stations with less airtime than that are valid for Tx. That means
there
are situations, like 50 clients, that some of the stations can be
used
to Tx when putting next_txq in the loop. Am I right?

I'm not sure what you mean here. Are you referring to the case
where
new
stations appear with a very low (zero) airtime_v_t? That is handled
when
the station is enqueued.
Hi Toke,

Sorry for the confusion. I am not referring to the case that you
mentioned though it can be solved by your subtle design, max(local
vt,
sta vt). :-)

Actually, my concern is situation about putting next_txq in the
loop.
Let me explain a little more and see below.

@@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw,
u8
ac)
 {
 	struct ieee80211_local *local = hw_to_local(hw);
+	struct rb_node *node = local->schedule_pos[ac];
 	struct txq_info *txqi = NULL;
+	bool first = false;

 	lockdep_assert_held(&local->active_txq_lock[ac]);

- begin:
-	txqi = list_first_entry_or_null(&local->active_txqs[ac],
-					struct txq_info,
-					schedule_order);
-	if (!txqi)
+	if (!node) {
+		node = rb_first_cached(&local->active_txqs[ac]);
+		first = true;
+	} else
+		node = rb_next(node);

Consider below piece of code from ath10k_mac_schedule_txq:

         ieee80211_txq_schedule_start(hw, ac);
         while ((txq = ieee80211_next_txq(hw, ac))) {
                 while (ath10k_mac_tx_can_push(hw, txq)) {
                         ret = ath10k_mac_tx_push_txq(hw, txq);
                         if (ret < 0)
                                 break;
                 }
                 ieee80211_return_txq(hw, txq);
                 ath10k_htt_tx_txq_update(hw, txq);
                 if (ret == -EBUSY)
                         break;
         }
         ieee80211_txq_schedule_end(hw, ac);

If my understanding is right, local->schedule_pos is used to record
the
last scheduled node and used for traversal rbtree for valid txq.
There
is chance that an empty txq is feeded to return_txq and got removed
from
rbtree. The empty txq will always be the rb_first node. Then in the
following next_txq, local->schedule_pos becomes meaningless since
its
rb_next will return NULL and the loop break. Only rb_first get
dequeued
during this loop.

	if (!node || RB_EMPTY_NODE(node)) {
		node = rb_first_cached(&local->active_txqs[ac]);
		first = true;
	} else
		node = rb_next(node);

Ah, I see what you mean. Yes, that would indeed be a problem - nice
catch! :)

How about this? The nodes on the rbtree will be dequeued and removed
from rbtree one by one until HW is busy. Please note local vt and
sta
vt will not be updated since txq lock is held during this time.

Insertion and removal from the rbtree are relatively expensive, so
I'd
rather not do that for every txq. I think a better way to solve this
is to just defer the actual removal from the tree until
ieee80211_txq_schedule_end()... Will fix that when I submit this
again.

Do you mean we keep the empty txqs in the rbtree until loop finishes
and
remove them in ieee80211_txq_schedule_end(may be put return_txq in
it)?
If it is the case, I suppose a list is needed to store the empty txqs
so
as to dequeue them in ieee80211_txq_schedule_end.

Yeah, return_txq() would just put "to be removed" TXQs on a list, and
schedule_end() would do the actual removal (after checking whether a
new
packet showed up in the meantime).

SGTM


And one more thing,

+ if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
+                       if (first)
+                               local->airtime_v_t[ac] =
sta->airtime[ac].v_t;
+                       else
+                               return NULL;

As local->airtime_v_t will not be updated during loop, we don't need
to
return NULL.

Yes we do; this is actually the break condition. I.e., stations whose
virtual time are higher than the global time (in local->airtime_v_t)
are
not allowed to transmit. And since we are traversing them in order,
when
we find the first such station, we are done and can break out of the
scheduling loop entirely (which is what we do by returning NULL). The
other branch in the inner if() is just for the case where no stations
are currently eligible to transmit according to this rule; here we
don't
want to stall, so we advance the global timer so the first station
becomes eligible...

Yes,the inner if() make sure first node always get scheduled no matter
its vt.

To detail my concern, let's assume only two nodes in the tree and
empty nodes will be in tree until schedule_end(). In the loop and in
case hw is not busy, ath10k will drain every node next_txq returned
before asking for another txq again. Then as we are traversing to next
rb node, it is highly possible the second node is not allowed to
transmit since the global time has not been updated yet as the active
txq lock is held. At this time, only second node on the tree has data
and hw is capable of sending more data. I don't think the second node
is not valid for transmission in this situation.

With more nodes in the tree in this situation, I think same thing
happens that all nodes except the first node are not allowed to
transmit since none of their vts are less than the global time which
is not updated in time. The loop breaks when we are checking the
second node.

Yeah, in many cases we will end up throttling all but the first (couple
of) node(s). This is by design; otherwise we can't ensure fairness. As
long as we are making forward progress that is fine, though...

-Toke

--
Yibo



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux