Felix Fietkau <nbd@xxxxxxxx> writes: > Hi Toke, > > Thanks for continuing to work on this! I just did a quick reading of the > code and haven't tested this yet - I might have some more comments in > the next few days. Awesome! You're welcome, and thanks for taking a look :) > On 2021-03-18 22:31, Toke Høiland-Jørgensen wrote: >> This switches the airtime scheduler in mac80211 to use a virtual time-based >> scheduler instead of the round-robin scheduler used before. This has a >> couple of advantages: >> >> - No need to sync up the round-robin scheduler in firmware/hardware with >> the round-robin airtime scheduler. >> >> - If several stations are eligible for transmission we can schedule both of >> them; no need to hard-block the scheduling rotation until the head of the >> queue has used up its quantum. >> >> - The check of whether a station is eligible for transmission becomes >> simpler (in ieee80211_txq_may_transmit()). >> >> The drawback is that scheduling becomes slightly more expensive, as we need >> to maintain an rbtree of TXQs sorted by virtual time. This means that >> ieee80211_register_airtime() becomes O(logN) in the number of currently >> scheduled TXQs because it can change the order of the scheduled stations. >> We mitigate this overhead by only resorting when a station changes position >> in the tree, and hopefully N rarely grows too big (it's only TXQs currently >> backlogged, not all associated stations), so it shouldn't be too big of an >> issue. >> >> To prevent divisions in the fast path, we maintain both station sums and >> pre-computed reciprocals of the sums. This turns the fast-path operation >> into a multiplication, with divisions only happening as the number of >> active stations change (to re-compute the current sum of all active station >> weights). To prevent this re-computation of the reciprocal from happening >> too frequently, we use a time-based notion of station activity, instead of >> updating the weight every time a station gets scheduled or de-scheduled. As >> queues can oscillate between empty and occupied quite frequently, this can >> significantly cut down on the number of re-computations. It also has the >> added benefit of making the station airtime calculation independent on >> whether the queue happened to have drained at the time an airtime value was >> accounted. >> >> Co-developed-by: Yibo Zhao <yiboz@xxxxxxxxxxxxxx> >> Signed-off-by: Yibo Zhao <yiboz@xxxxxxxxxxxxxx> >> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> >> --- >> Respinning this has taken way too long, but assuming anyone actually remembers >> the previous version from a bit over a year ago, here's the changelog: >> >> Changes since v5: >> Rebase on latest mac80211-next. >> >> Fix issue with scheduling hanging because the schedule position was not >> cleared when starting a new scheduling round. >> >> Switch the reciprocal calculation to use u32 (split 19/13) for per-station >> weights and a u64 only for the weight sum (to cut down on the number of 64-bit >> operations performed) >> >> Introduce the notion of time-based station activity when calculating weight >> sums. This also gets rid of the need for a "grace time" when catching up >> stations, since we now have a direct notion of when a station has been >> inactive for a while. > Not sure if I'm misunderstanding the code, but this does not seem enough > to me. From what I can see, you consider a station active if it has been > scheduled in the last 100ms. Let's say we keep sending a single small > packet to a particular sta every 90ms (thus keeping it active) for a > long period of time and then suddenly start a really huge transfer. > What keeps it from then taking an unreasonably large share of the > airtime for as long as it takes for the virtual time to catch up? > > Am I missing something or should we maybe use the new notion of > time-based activity *and* do a grace time catch up? Hmm, yeah, I guess you're right? I haven't seen this in practice, but I haven't been looking for it either. And I suppose someone who set out to trigger this would be able to. So adding back the grace time (but maybe with a longer interval?) would probably be best. I'll wait for your other comments before respinning, though... -Toke