Re: [PATCH RFC v5 3/4] mac80211: Add airtime accounting and scheduling to TXQs

Toke Høiland-Jørgensen <toke@xxxxxxx> · Thu, 11 Oct 2018 12:38:02 +0200

Rajkumar Manoharan <rmanohar@xxxxxxxxxxxxxx> writes:

> On 2018-10-10 04:15, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan <rmanohar@xxxxxxxxxxxxxx> writes:
>> 
>>> On 2018-10-09 05:32, Toke Høiland-Jørgensen wrote:
>>>> This adds airtime accounting and scheduling to the mac80211 TXQ
>>>> scheduler. A new callback, ieee80211_sta_register_airtime(), is added
>>>> that drivers can call to report airtime usage for stations.
>>>> 
>>>> When airtime information is present, mac80211 will schedule TXQs
>>>> (through ieee80211_next_txq()) in a way that enforces airtime 
>>>> fairness
>>>> between active stations. This scheduling works the same way as the
>>>> ath9k
>>>> in-driver airtime fairness scheduling. If no airtime usage is 
>>>> reported
>>>> by the driver, the scheduler will default to round-robin scheduling.
>>>> 
>>>> For drivers that don't control TXQ scheduling in software, a new API
>>>> function, ieee80211_txq_may_transmit(), is added which the driver can
>>>> use
>>>> to check if the TXQ is eligible for transmission, or should be
>>>> throttled to
>>>> enforce fairness. Calls to this function must also be enclosed in
>>>> ieee80211_txq_schedule_{start,end}() calls to ensure proper locking.
>>>> TXQs
>>>> that are throttled by ieee802111_txq_may_transmit() will be woken up
>>>> again
>>>> by a check added to the ieee80211_wake_txqs() tasklet.
>>>> 
>>> 
>>> Toke,
>>> 
>>> I am observing soft lockup issues again with this new series while
>>> running traffic with 50 clients. I am continuing testing with earlier
>>> series along with snippet I shared.
>> 
>> Are these new lockups (that was not in your patched previous version),
>> or did I just not get all your lock-related fixes incorporated?
>> 
>>> When driver operates in pull-mode, throttled txqs are marked and
>>> refilled in airtime_tasklet. This is causing major throughput drops
>>> and packet loss and I am suspecting the latency in replenishing
>>> deficit. Whereas in push-mode or in ath9k model, refill happens
>>> quicker at every packet indication as well as tx completion.
>> 
>> Yeah, the tasklet shouldn't be the main source of deficit replenishing.
>> Can see why that would give bad performance :)
>> 
>>> I am planning to get rid of tasklet completely as it is only meant for
>>> pull-mode. It would be better to refill in may_transmit() itself.
>> 
>> Hmm, right. So the way to do this correctly (from a fairness point of
>> view) would be something like this (in max_tx()):
>> 
>> if (this_txq.stn.deficit > 0)
>>   return true;
>> 
>> else if (any queued TXQ currently have positive deficit)
>>   return false; /* other TXQ should try may_tx() later and get 
>> permission */
>> 
>> else /* all deficits < 0 */
>>   return replenish_deficits(this_txq);
>> 
>> And replenish_deficits() would be something like:
>> 
>> replenish_deficits(this_txq) {
>> repeat:
>>   for (txq in queued txqs) {
>>     txq.stn.deficit += stn.weight;
>>     if (txq.stn.deficit > 0 && !wake_txq)
>>       wake_txq = txq;
>>   }
>>   if not wake_txq:
>>     goto repeat;
>> 
>>   if (this_txq.stn.deficit > 0)
>>     return true;
>>   else
>>     drv_wake_tx_queue(wake_txq);
>> }
>> 
>> The wake_tx_queue call may have to be delegated to a tasklet still, to
>> avoid the infinite recursion problem I mentioned earlier. But the
>> tasklet could be made simpler and wouldn't have to be called so 
>> often...
>> 
>> Does the above make sense?
>> 
> Hmm... mine is bit different. txqs are refilled only once for all txqs.
> It will give more opportunity for non-served txqs. drv_wake_tx_queue 
> won't be
> called from may_tx as the driver anyway will not push packets in 
> pull-mode.

So, as far as I can tell, this requires the hardware to "keep trying"?
I.e., if it just stops scheduling a TXQ after may_transmit() returns
false, there is no guarantee that that TXQ will ever get re-awoken
unless a new packet arrives for it?

-Toke