Re: [PATCH v6 3/4] mac80211: Implement Airtime-based Queue Limit (AQL)

Kan Yan <kyan@xxxxxxxxxx> · Fri, 8 Nov 2019 17:22:38 -0800

It is most likely just insufficient locking. active_txq_lock is per
AC, can't protect local->aql_total_pending_airtime against racing
conditions:
void ieee80211_sta_update_pending_airtime(...)
{
        spin_lock_bh(&local->active_txq_lock[ac]);
        ...
        local->aql_total_pending_airtime -= tx_airtime;
        ...
        spin_unlock_bh(&local->active_txq_lock[ac]);
}

After changing it to atomic_t, no more aql_total_pending_airtime
underflow so far :). Using atomic operation should also help reduce
CPU overhead due to lock contention, as
ieee80211_sta_update_pending_airtime() is often called from the tx
completion routine triggered by interrupts, often in a different core
than where __ieee80211_schedule_txq() is running.

I will post a new version a bit later if the test goes well.

Regards,
Kan

On Fri, Nov 8, 2019 at 3:17 AM Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote:
>
> On Fri, 2019-11-08 at 12:10 +0100, Toke Høiland-Jørgensen wrote:
>
> > Right, bugger. I was thinking maybe there's a case where skbs can be
> > cloned (and retain the tx_time_est field) and then released twice?
>
> They could be cloned, but I don't see how that'd be while *inside* the
> stack and then they get reported twice - unless the driver did something
> like that?
>
> I mean, TCP surely does that for example, but it's before we even get to
> mac80211.
>
> > Or
> > maybe somewhere that steps on the skb->cb field in some other way?
> > Couldn't find anything obvious on a first perusal of the TX path code,
> > but maybe you could think of something?
>
> No, sorry. But I also didn't actually look at the driver at all.
>
> > Otherwise I guess we'll be forced to go and do some actual,
> > old-fashioned debugging ;)
>
> :)
>
> johannes
>