Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: > On 2019-04-21 05:15, Toke Høiland-Jørgensen wrote: >> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >> >>> On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote: >>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>> >>>>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote: >>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>> >>>>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote: >>>>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>>>> >>>>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote: >>>>>>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>>>>>> >>>>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote: >>>>>>>>>>>> This switches the airtime scheduler in mac80211 to use a >>>>>>>>>>>> virtual >>>>>>>>>>>> time-based >>>>>>>>>>>> scheduler instead of the round-robin scheduler used before. >>>>>>>>>>>> This >>>>>>>>>>>> has >>>>>>>>>>>> a >>>>>>>>>>>> couple of advantages: >>>>>>>>>>>> >>>>>>>>>>>> - No need to sync up the round-robin scheduler in >>>>>>>>>>>> firmware/hardware >>>>>>>>>>>> with >>>>>>>>>>>> the round-robin airtime scheduler. >>>>>>>>>>>> >>>>>>>>>>>> - If several stations are eligible for transmission we can >>>>>>>>>>>> schedule >>>>>>>>>>>> both of >>>>>>>>>>>> them; no need to hard-block the scheduling rotation until >>>>>>>>>>>> the >>>>>>>>>>>> head >>>>>>>>>>>> of >>>>>>>>>>>> the >>>>>>>>>>>> queue has used up its quantum. >>>>>>>>>>>> >>>>>>>>>>>> - The check of whether a station is eligible for transmission >>>>>>>>>>>> becomes >>>>>>>>>>>> simpler (in ieee80211_txq_may_transmit()). >>>>>>>>>>>> >>>>>>>>>>>> The drawback is that scheduling becomes slightly more >>>>>>>>>>>> expensive, >>>>>>>>>>>> as >>>>>>>>>>>> we >>>>>>>>>>>> need >>>>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This >>>>>>>>>>>> means >>>>>>>>>>>> that >>>>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of >>>>>>>>>>>> currently >>>>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows >>>>>>>>>>>> too >>>>>>>>>>>> big >>>>>>>>>>>> (it's >>>>>>>>>>>> only TXQs currently backlogged, not all associated stations), >>>>>>>>>>>> so >>>>>>>>>>>> it >>>>>>>>>>>> shouldn't be too big of an issue. >>>>>>>>>>>> >>>>>>>>>>>> @@ -1831,18 +1830,32 @@ void >>>>>>>>>>>> ieee80211_sta_register_airtime(struct >>>>>>>>>>>> ieee80211_sta *pubsta, u8 tid, >>>>>>>>>>>> { >>>>>>>>>>>> struct sta_info *sta = container_of(pubsta, struct >>>>>>>>>>>> sta_info, >>>>>>>>>>>> sta); >>>>>>>>>>>> struct ieee80211_local *local = sta->sdata->local; >>>>>>>>>>>> + struct ieee80211_txq *txq = sta->sta.txq[tid]; >>>>>>>>>>>> u8 ac = ieee80211_ac_from_tid(tid); >>>>>>>>>>>> - u32 airtime = 0; >>>>>>>>>>>> + u64 airtime = 0, weight_sum; >>>>>>>>>>>> + >>>>>>>>>>>> + if (!txq) >>>>>>>>>>>> + return; >>>>>>>>>>>> >>>>>>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_TX) >>>>>>>>>>>> airtime += tx_airtime; >>>>>>>>>>>> if (sta->local->airtime_flags & AIRTIME_USE_RX) >>>>>>>>>>>> airtime += rx_airtime; >>>>>>>>>>>> >>>>>>>>>>>> + /* Weights scale so the unit weight is 256 */ >>>>>>>>>>>> + airtime <<= 8; >>>>>>>>>>>> + >>>>>>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]); >>>>>>>>>>>> + >>>>>>>>>>>> sta->airtime[ac].tx_airtime += tx_airtime; >>>>>>>>>>>> sta->airtime[ac].rx_airtime += rx_airtime; >>>>>>>>>>>> - sta->airtime[ac].deficit -= airtime; >>>>>>>>>>>> + >>>>>>>>>>>> + weight_sum = local->airtime_weight_sum[ac] ?: >>>>>>>>>>>> sta->airtime_weight; >>>>>>>>>>>> + >>>>>>>>>>>> + local->airtime_v_t[ac] += airtime / weight_sum; >>> Hi Toke, >>> >>> I was porting this version of ATF design to my ath10k platform and >>> found >>> my old kernel version not supporting 64bit division. I'm wondering if >>> it >>> is necessary to use u64 for airtime and weight_sum here though I can >>> find a solution for it. I think u32 might be enough. For airtime, >>> u32_max / 256 = 7182219 us(718 ms). As for weight_sum, u32_max / 8092 >>> us >>> = 130490, meaning we can support more than 130000 nodes with airtime >>> weight 8092 us. >> >> As Felix said, we don't really want divides in the fast path at all. >> And >> since the divisors are constant, we should be able to just pre-compute >> reciprocals and turn the whole thing into multiplications... >> >>> Another finding was when I configured two 11ac STAs with different >>> airtime weight, such as 256 and 1024 meaning ratio is 1:4, the >>> throughput ratio was not roughly matching the ratio. Could you please >>> share your results? I am not sure if it is due to platform difference. >> >> Hmm, I tested them with ath9k where things seemed to work equivalently >> to the DRR. Are you testing the same hardware with that? Would be a >> good >> baseline. >> >> I am on vacation until the end of the month, but can share my actual >> test results once I get back... > Hi Toke, > I saw your commit in hostapd in > http://patchwork.ozlabs.org/patch/1059334/ > > For dynamic and limit mode described in above hostapd patch, do I need > to change any code in this kernel patch or any other patches am I > missing? Nope, the kernel just exposes the API to set weights, hostapd does everything else :) > After a quick look at the hostapd patch, I guess all the efforts for > both modes are done in hostapd. Correct me if I am wrong. :) You are quite right! -Toke