> - AQL estimated airtime does not take into account A-MPDU, so it is > significantly overestimating airtime use for aggregated traffic, > especially on high rates. > My proposed solution would be to check for a running aggregation session > and set estimated tx time to something like: > expected_airtime(16 * skb->len) / 16. Yes, that's a known limitation that needs to be improved. I actually post a comment for this patch: "[PATCH v10 2/4] mac80211: Import airtime calculation code from mt76" > > When a txq is subjected to the queue limit, > there is usually a significant amount of frames being queued and those > frames are likely being sent out in large aggregations. So, in most > cases when AQL is active, the medium access overhead can be amortized > over many frames and the per frame overhead could be considerably > lower than 36, especially at higher data rate. As a result, the > pending airtime calculated this way could be higher than the actual > airtime. In my test, I have to compensate that by increasing > "aql_txq_limit" via debugfs to get the same peak throughput as without > AQL. > My proposed solution would be to check for a running aggregation session > and set estimated tx time to something like: > expected_airtime(16 * skb->len) / 16. I think that's a reasonable approximation, but doubts aggregation information is available in all architectures. In some architecture, firmware may only report aggregation information after the frame has been transmitted. In my earlier version of AQL for the out-of-tree ChromeOS kernel, A-MPDU is dealt this way: The the medium access overhead is only counted once for each TXQ for all frames released to the hardware over a 4ms period, assuming those frames are likely to be aggregated together. Instead of calculating the estimated airtime using the last known phy rate, then try to add some estimated overhead for medium access time, another potentially better approach is to use average data rate, which is byte_transmited/firmware_reported_actual_airtime. The average rate not only includes medium access overhead, but also takes retries into account. > - We need an API that allows the driver to change the pending airtime > values, e.g. subtract estimated tx time for a packet. > mt76 an ath9k can queue packets inside the driver that are not currently > in the hardware queues. Typically if the txqs have more data than what > gets put into the hardware queue, both drivers will pull an extra frame > and queue it in its private txq struct. This frame will get used on the > next txq scheduling round for that particular station. > If you have lots of stations doing traffic (or having driver buffered > frames in powersave mode), this could use up a sizable chunk of the AQL > budget. Not sure I fully understand your concerns here. The AQL budget is per STA, per TID, so frames queued in the driver's special queue for one station won't impact another station's budget. Those frames are counted towards the per interface pending airtime, which could trigger AQL to switch to use the lower queue limit. In this case, that could be the desirable behavior when there is heavy traffic. Regards, Kan On Wed, Feb 26, 2020 at 1:56 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Felix Fietkau <nbd@xxxxxxxx> writes: > > > Hi, > > > > I've take a closer look at the AQL implementation, and I found some > > corner cases that need to be addressed soon: > > > > - AQL estimated airtime does not take into account A-MPDU, so it is > > significantly overestimating airtime use for aggregated traffic, > > especially on high rates. > > My proposed solution would be to check for a running aggregation session > > and set estimated tx time to something like: > > expected_airtime(16 * skb->len) / 16. > > This seems reasonable. Not sure if it'll do anything for ath10k (does > mac80211 know the aggregation state for that?), but maybe this is not > such a big issue on that hardware? > > > - We need an API that allows the driver to change the pending airtime > > values, e.g. subtract estimated tx time for a packet. > > mt76 an ath9k can queue packets inside the driver that are not currently > > in the hardware queues. Typically if the txqs have more data than what > > gets put into the hardware queue, both drivers will pull an extra frame > > and queue it in its private txq struct. This frame will get used on the > > next txq scheduling round for that particular station. > > If you have lots of stations doing traffic (or having driver buffered > > frames in powersave mode), this could use up a sizable chunk of the AQL > > budget. > > I'm a bit more skeptical about this. If the driver buffers a bunch of > packets that are not accounted that will hurt that station due to extra > latency when it wakes up. For ath9k, this is the retry_q you're talking > about, right? The number of packets queued on that is fairly limited, > isn't it? What kind of powersave buffering is the driver doing, and why > can't it leave the packets on the TXQ? That would allow them to be > scheduled along with any new ones that might have arrived in the > meantime, which would be a benefit for latency. > > I can see how it can be a problem that many stations in powersave can > block transmissions for *other* stations, though. Maybe an alternative > to the driver subtracting airtime could be to have mac80211 do something > like this when a station is put into powersave mode: > > local->aql_total_pending_airtime -= sum(sta->airtime[ac].aql_tx_pending) > > (where sum is the summation over all ACs) > > and the reverse when the station wakes back up? That should keep the > whole device from throttling but still prevent a big queue building up > for that sta when it wakes back up. Would that work, do you think? > > -Toke >