Re: [RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Tue, 09 Apr 2019 22:41:06 +0200

Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes:

> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes:
>> 
>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>> This switches the airtime scheduler in mac80211 to use a virtual
>>>> time-based
>>>> scheduler instead of the round-robin scheduler used before. This has 
>>>> a
>>>> couple of advantages:
>>>> 
>>>> - No need to sync up the round-robin scheduler in firmware/hardware
>>>> with
>>>>   the round-robin airtime scheduler.
>>>> 
>>>> - If several stations are eligible for transmission we can schedule
>>>> both of
>>>>   them; no need to hard-block the scheduling rotation until the head 
>>>> of
>>>> the
>>>>   queue has used up its quantum.
>>>> 
>>>> - The check of whether a station is eligible for transmission becomes
>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>> 
>>>> The drawback is that scheduling becomes slightly more expensive, as 
>>>> we
>>>> need
>>>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>>>> ieee80211_register_airtime() becomes O(logN) in the number of 
>>>> currently
>>>> scheduled TXQs. However, hopefully this number rarely grows too big
>>>> (it's
>>>> only TXQs currently backlogged, not all associated stations), so it
>>>> shouldn't be too big of an issue.
>>>> 
>>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>>>> ieee80211_sta *pubsta, u8 tid,
>>>>  {
>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>> -	u32 airtime = 0;
>>>> +	u64 airtime = 0, weight_sum;
>>>> +
>>>> +	if (!txq)
>>>> +		return;
>>>> 
>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>  		airtime += tx_airtime;
>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>  		airtime += rx_airtime;
>>>> 
>>>> +	/* Weights scale so the unit weight is 256 */
>>>> +	airtime <<= 8;
>>>> +
>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>> +
>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>> -	sta->airtime[ac].deficit -= airtime;
>>>> +
>>>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>>>> +
>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>> Hi Toke,
>>> 
>>> Please ignore the previous two broken emails regarding this new 
>>> proposal
>>> from me.
>>> 
>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>>> stations with less airtime than that are valid for Tx. That means 
>>> there
>>> are situations, like 50 clients, that some of the stations can be used
>>> to Tx when putting next_txq in the loop. Am I right?
>> 
>> I'm not sure what you mean here. Are you referring to the case where 
>> new
>> stations appear with a very low (zero) airtime_v_t? That is handled 
>> when
>> the station is enqueued.
> Hi Toke,
>
> Sorry for the confusion. I am not referring to the case that you 
> mentioned though it can be solved by your subtle design, max(local vt, 
> sta vt). :-)
>
> Actually, my concern is situation about putting next_txq in the loop. 
> Let me explain a little more and see below.
>
>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8
>> ac)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> +	struct rb_node *node = local->schedule_pos[ac];
>>  	struct txq_info *txqi = NULL;
>> +	bool first = false;
>> 
>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>> 
>> - begin:
>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>> -					struct txq_info,
>> -					schedule_order);
>> -	if (!txqi)
>> +	if (!node) {
>> +		node = rb_first_cached(&local->active_txqs[ac]);
>> +		first = true;
>> +	} else
>> +		node = rb_next(node);
>
> Consider below piece of code from ath10k_mac_schedule_txq:
>
>          ieee80211_txq_schedule_start(hw, ac);
>          while ((txq = ieee80211_next_txq(hw, ac))) {
>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>                          if (ret < 0)
>                                  break;
>                  }
>                  ieee80211_return_txq(hw, txq);
>                  ath10k_htt_tx_txq_update(hw, txq);
>                  if (ret == -EBUSY)
>                          break;
>          }
>          ieee80211_txq_schedule_end(hw, ac);
>
> If my understanding is right, local->schedule_pos is used to record the 
> last scheduled node and used for traversal rbtree for valid txq. There 
> is chance that an empty txq is feeded to return_txq and got removed from 
> rbtree. The empty txq will always be the rb_first node. Then in the 
> following next_txq, local->schedule_pos becomes meaningless since its 
> rb_next will return NULL and the loop break. Only rb_first get dequeued 
> during this loop.
>
> 	if (!node || RB_EMPTY_NODE(node)) {
> 		node = rb_first_cached(&local->active_txqs[ac]);
> 		first = true;
> 	} else
> 		node = rb_next(node);

Ah, I see what you mean. Yes, that would indeed be a problem - nice
catch! :)

> How about this? The nodes on the rbtree will be dequeued and removed
> from rbtree one by one until HW is busy. Please note local vt and sta
> vt will not be updated since txq lock is held during this time.

Insertion and removal from the rbtree are relatively expensive, so I'd
rather not do that for every txq. I think a better way to solve this
is to just defer the actual removal from the tree until
ieee80211_txq_schedule_end()... Will fix that when I submit this again.

-Toke