Re: [PATCH 2/2] mac80211: rework the airtime fairness implementation

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Sat, 11 Jun 2022 13:15:08 +0200

Felix Fietkau <nbd@xxxxxxxx> writes:

>>> I tried to avoid it, but I didn't find a better way to do it. I added it 
>>> in order to define the number of levels for the skiplist only once and 
>>> make the code resolve it for the individual functions at compile time in 
>>> a type safe way.
>>> Other data structures in the kernel don't need this, because their 
>>> member node struct size typically doesn't change based on a given parameter.
>> 
>> Well, only way I can think of would be doing something like:
>> 
>> #define MAX_SKIPLIST_LEVELS 5
>> struct skiplist_node {
>> 	struct skiplist_node *next[MAX_SKIPLIST_LEVELS];
>> };
>> 
>> And then having the init function take the actual number of levels for
>> that instance and store it in the _list struct. Then, if someone wants
>> to use the skiplist with more levels at some later point, they'd bump
>> the MAX_SKIPLIST_LEVELS define and use that for their initialiser.
> That's exactly what I wanted to avoid. For lists with lots of elements 
> you might need a lot more levels, which would add a significant amount 
> of bloat for small lists.

Right, fair enough, let's keep the macro thing, then. I have an idea
that I'd like to test out for a skiplist-based BPF map type which would
require runtime sizing, but I can probably fudge that at least until
I've figured out if it's feasible, then we can revisit if needed.

>>>> So this implies that we always schedule in airtime order (i.e., enforce
>>>> fairness) for any driver that can get a meaningful value returned from
>>>> ieee80211_calc_expected_tx_airtime(), right? That's probably OK, but
>>>> just want to make sure we've thought through all the implications of
>>>> this.
>>>> 
>>>> A comment here explaining why this is done would be useful; it's a bit
>>>> counter-intuitive when just looking at the code. Your comment in the
>>>> commit message implies that scheduling doesn't work correctly if this is
>>>> not done, but then what happens if airtime is 0 and we bail out above?
>>> I guess I need to add something to deal with that corner case, maybe by 
>>> returning the smallest possible value for expected tx airtime if it 
>>> can't be calculated.
>> 
>> I think there are two approaches here:
>> 
>> - Make sure we always return at least '1' for airtime, so the counter
>>    always increases on every schedule (effectively making it a frame
>>    counter instead).
>> 
>> - Make sure the skiplist does the right thing if all entries enqueued
>>    have the same (0) value (i.e., that it degrades to round-robin).
>> 
>> I think the skiplist does in fact degrade to round-robin since it treats
>> equality the same as "greater than", but it may be somewhat inefficient
>> when inserting in this case? Or?
> The delete logic in my skiplist code can't properly deal with multiple 
> items having the same value. Adding that would likely make the 
> implementation somewhat more complex, so I decided to sidestep it by 
> making it use an internal wrapper for the comparison function which 
> compares the pointer address in case of equal value.

Ah, right, missed that bit; was only looking at insert.

Okay, but even if you're returning a fixed minimum value for stations
with no real airtime estimation, that would still result in multiple
entries with the same value with a high probability.

Hmm, how about fixing it by adding an "insert epoch" to the
skiplist_node? I.e., on insert, walk all the nodes with the same cmp()
value and set this field to the highest value? Like

cmp(a, b) {
 cmpval = __cmpfunc(a, b);
 if (!cmpval)
   return a->epoch - b->epoch;
}

insert() {
  while (!cmp(node, list[i])) {
    i++; node->epoch++;
  }
}

(doesn't quite work, but hopefully you get the gist of it? epoch should
be reset on dequeue of course).

Something like this would preserve equal-value elements in the same
order across insert-remove, at the cost of making insert less efficient
(I think?) if many items share the same value. Maybe that's acceptable?

>>>> How can this happen in normal operation? This implies that a TXQ was
>>>> scheduled without a backlog; isn't that a bug that we should warn on?
>>> At least mt76 assumes that calling ieee80211_next_txq in a loop during a 
>>> scheduling round will eventually return NULL, even when no frames were 
>>> queued. ath9k could potentially also need this, depending on the block 
>>> ack window state.
>>> This assumption was valid for the previous implementation, and I figured 
>>> it would be cleaner and more robust to preserve it.
>> 
>> Sure, I think we should; but doesn't that already happen above, at the
>> 'if (!node)' check? I.e., if the skiplist is empty we'll return NULL.
> In the state that I'm talking about, the skiplist won't be empty. I'm 
> assuming that the first txq in the list has some queued packets, has not 
> used its AQL budget yet, but the driver still can't pull more packets 
> from it because of driver specific conditions. In that case, it will run 
> a loop where ieee80211_next_txq returns the same queue over and over 
> again, and the driver returns it each time (which causes it to get added 
> back to the list).

But that's what the check against schedule_pos is for, right? I.e.,
making sure that we return NULL the second time we encounter the same
node in a scheduling round.

What I'm talking about is the last condition check in this while loop:

+	while (1) {
+		txqi = to_txq_info(txq);
+		if (test_and_clear_bit(IEEE80211_TXQ_FORCE_ACTIVE, &txqi->flags))
+			break;
+
+		if (txq_has_queue(txq))
+			break;
+
+		airtime_info_next_txq_idx(air_info);
+		txq = air_info->txq[air_info->txq_idx];
+		if (txq_idx == air_info->txq_idx)
+			goto begin;
+	}

(the 'goto begin'). That will only get hit if we've looped through all
the TXQs in an air_info, without finding any that has either the force
bit or a packet queued.

But if we agreed (below) that the logic in ieee80211_return_txq() is:

 	if (force || (txq_has_queue(txq) && ieee80211_txq_airtime_check(hw, &txqi->txq)))
 		__ieee80211_insert_txq(local, air_sched, txqi);
 	else
 		__ieee80211_unschedule_txq(hw, txq, false);

... how does an air_info end up in the skiplist without having either
the force bit or a backlog in at least one TXQ?

>>>> This sets the bit even if the AQL check fails below; is that intentional?
>>> Yes. The bit indicates that the queue should be passed to the driver 
>>> even when mac80211 has no frames queued for it (presumably because the 
>>> driver has queued some internally).
>> 
>> But it won't actually be inserted into the rotation if the AQL check
>> fails (because then the 'else if' check of 'force' won't happen)?
>> 
>> I think the right logic would be:
>> 
>> 	if (force || (txq_has_queue(txq) && ieee80211_txq_airtime_check(hw, &txqi->txq)))
>> 		__ieee80211_insert_txq(local, air_sched, txqi);
>> 	else
>> 		__ieee80211_unschedule_txq(hw, txq, false);
>> 
>> no?
> I thought about it some more, and I think you're right. I was not 
> putting it back into rotation under the assumption that pulled but not 
> fully enqueued packets in the driver don't take up a significant chunk 
> of the AQL budget, but that assumption may not always be true, depending 
> on the driver approach and the tx rate (especially with CCK).

Yup, also the AQL limits are user-configurable, so it's probably best
not to make any assumptions about their magnitude...

>>>> I get the intention behind this, and think it's (probably) reasonable.
>>>> But testing it with an actual ath10k device in push/pull mode would be
>>>> good :)
>>> I'm not very familiar with ath10k, could you help me with that?
>> 
>> Sadly I don't have a good handle on the ath10k either, and there are
>> already issues with the current virtual time-based code in ath10k
>> push/pull-mode. There is some discussion on the openwrt forum[0], which
>> indicates that this patch doesn't help with the issue either.
>> 
>> Unless someone shows up with the time, motivation and hardware to
>> properly fix this, I think the most sensible thing to do may be to just
>> turn the whole of ieee80211_txq_may_transmit() into a wrapper around
>> ieee80211_txq_airtime_check() (i.e., do the AQL check, don't bother with
>> fairness). WDYT?
> I agree. My guess is that proper fairness is probably incompatible with 
> push-pull mode behavior of the firmware anyway.

Yeah, I suspect the reason the old code (back when we did round-robin
scheduling) "works" is that it just keeps looping until the TXQ is
eligible, so it's not enforcing fairness anyway...

-Toke