Re: [PATCH mac80211-next v6] mac80211: Switch to a virtual time-based airtime scheduler

Felix Fietkau <nbd@xxxxxxxx> · Thu, 18 Mar 2021 22:57:31 +0100

Hi Toke,

Thanks for continuing to work on this! I just did a quick reading of the
code and haven't tested this yet - I might have some more comments in
the next few days.

On 2021-03-18 22:31, Toke Høiland-Jørgensen wrote:
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
> 
> - No need to sync up the round-robin scheduler in firmware/hardware with
>   the round-robin airtime scheduler.
> 
> - If several stations are eligible for transmission we can schedule both of
>   them; no need to hard-block the scheduling rotation until the head of the
>   queue has used up its quantum.
> 
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
> 
> The drawback is that scheduling becomes slightly more expensive, as we need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs because it can change the order of the scheduled stations.
> We mitigate this overhead by only resorting when a station changes position
> in the tree, and hopefully N rarely grows too big (it's only TXQs currently
> backlogged, not all associated stations), so it shouldn't be too big of an
> issue.
> 
> To prevent divisions in the fast path, we maintain both station sums and
> pre-computed reciprocals of the sums. This turns the fast-path operation
> into a multiplication, with divisions only happening as the number of
> active stations change (to re-compute the current sum of all active station
> weights). To prevent this re-computation of the reciprocal from happening
> too frequently, we use a time-based notion of station activity, instead of
> updating the weight every time a station gets scheduled or de-scheduled. As
> queues can oscillate between empty and occupied quite frequently, this can
> significantly cut down on the number of re-computations. It also has the
> added benefit of making the station airtime calculation independent on
> whether the queue happened to have drained at the time an airtime value was
> accounted.
> 
> Co-developed-by: Yibo Zhao <yiboz@xxxxxxxxxxxxxx>
> Signed-off-by: Yibo Zhao <yiboz@xxxxxxxxxxxxxx>
> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
> ---
> Respinning this has taken way too long, but assuming anyone actually remembers
> the previous version from a bit over a year ago, here's the changelog:
> 
> Changes since v5:
>   Rebase on latest mac80211-next.
> 
>   Fix issue with scheduling hanging because the schedule position was not
>   cleared when starting a new scheduling round.
> 
>   Switch the reciprocal calculation to use u32 (split 19/13) for per-station
>   weights and a u64 only for the weight sum (to cut down on the number of 64-bit
>   operations performed)
> 
>   Introduce the notion of time-based station activity when calculating weight
>   sums. This also gets rid of the need for a "grace time" when catching up
>   stations, since we now have a direct notion of when a station has been
>   inactive for a while.
Not sure if I'm misunderstanding the code, but this does not seem enough
to me. From what I can see, you consider a station active if it has been
scheduled in the last 100ms. Let's say we keep sending a single small
packet to a particular sta every 90ms (thus keeping it active) for a
long period of time and then suddenly start a really huge transfer.
What keeps it from then taking an unreasonably large share of the
airtime for as long as it takes for the virtual time to catch up?

Am I missing something or should we maybe use the new notion of
time-based activity *and* do a grace time catch up?

- Felix