Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: > On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote: >> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >> >>> On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote: >>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>> >>>>> On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote: >>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>> >>>>>>> On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote: >>>>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>>>> >>>>>>>>> On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote: >>>>>>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>>>>>> >>>>>>>>>>> On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote: >>>>>>>>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>>>>>>>> >>>>>>>>>>>>> On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote: >>>>>>>>>>>>>> Yibo Zhao <yiboz@xxxxxxxxxxxxxx> writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> In a loop txqs dequeue scenario, if the first txq in the >>>>>>>>>>>>>>> rbtree >>>>>>>>>>>>>>> gets >>>>>>>>>>>>>>> removed from rbtree immediately in the >>>>>>>>>>>>>>> ieee80211_return_txq(), >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> loop will break soon in the ieee80211_next_txq() due to >>>>>>>>>>>>>>> schedule_pos >>>>>>>>>>>>>>> not leading to the second txq in the rbtree. Thus, >>>>>>>>>>>>>>> defering >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> removal right before the end of this schedule round. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Co-developed-by: Yibo Zhao <yiboz@xxxxxxxxxxxxxx> >>>>>>>>>>>>>>> Signed-off-by: Yibo Zhao <yiboz@xxxxxxxxxxxxxx> >>>>>>>>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxx> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I didn't write this patch, so please don't use my sign-off. >>>>>>>>>>>>>> I'll >>>>>>>>>>>>>> add >>>>>>>>>>>>>> ack or review tags as appropriate in reply; but a few >>>>>>>>>>>>>> comments >>>>>>>>>>>>>> first: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> include/net/mac80211.h | 16 ++++++++++-- >>>>>>>>>>>>>>> net/mac80211/ieee80211_i.h | 3 +++ >>>>>>>>>>>>>>> net/mac80211/main.c | 6 +++++ >>>>>>>>>>>>>>> net/mac80211/tx.c | 63 >>>>>>>>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++--- >>>>>>>>>>>>>>> 4 files changed, 83 insertions(+), 5 deletions(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/include/net/mac80211.h >>>>>>>>>>>>>>> b/include/net/mac80211.h >>>>>>>>>>>>>>> index ac2ed8e..ba5a345 100644 >>>>>>>>>>>>>>> --- a/include/net/mac80211.h >>>>>>>>>>>>>>> +++ b/include/net/mac80211.h >>>>>>>>>>>>>>> @@ -925,6 +925,8 @@ struct ieee80211_tx_rate { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #define IEEE80211_MAX_TX_RETRY 31 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100 >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> static inline void ieee80211_rate_set_vht(struct >>>>>>>>>>>>>>> ieee80211_tx_rate >>>>>>>>>>>>>>> *rate, >>>>>>>>>>>>>>> u8 mcs, u8 nss) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> @@ -6232,7 +6234,8 @@ struct sk_buff >>>>>>>>>>>>>>> *ieee80211_tx_dequeue(struct >>>>>>>>>>>>>>> ieee80211_hw *hw, >>>>>>>>>>>>>>> * @ac: AC number to return packets from. >>>>>>>>>>>>>>> * >>>>>>>>>>>>>>> * Should only be called between calls to >>>>>>>>>>>>>>> ieee80211_txq_schedule_start() >>>>>>>>>>>>>>> - * and ieee80211_txq_schedule_end(). >>>>>>>>>>>>>>> + * and ieee80211_txq_schedule_end(). If the txq is empty, >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> will >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>> added >>>>>>>>>>>>>>> + * to a remove list and get removed later. >>>>>>>>>>>>>>> * Returns the next txq if successful, %NULL if no queue >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> eligible. >>>>>>>>>>>>>>> If a txq >>>>>>>>>>>>>>> * is returned, it should be returned with >>>>>>>>>>>>>>> ieee80211_return_txq() >>>>>>>>>>>>>>> after the >>>>>>>>>>>>>>> * driver has finished scheduling it. >>>>>>>>>>>>>>> @@ -6268,7 +6271,8 @@ void >>>>>>>>>>>>>>> ieee80211_txq_schedule_start(struct >>>>>>>>>>>>>>> ieee80211_hw *hw, u8 ac) >>>>>>>>>>>>>>> * @hw: pointer as obtained from ieee80211_alloc_hw() >>>>>>>>>>>>>>> * @ac: AC number to acquire locks for >>>>>>>>>>>>>>> * >>>>>>>>>>>>>>> - * Release locks previously acquired by >>>>>>>>>>>>>>> ieee80211_txq_schedule_end(). >>>>>>>>>>>>>>> + * Release locks previously acquired by >>>>>>>>>>>>>>> ieee80211_txq_schedule_end(). >>>>>>>>>>>>>>> Check >>>>>>>>>>>>>>> + * and remove the empty txq from rb-tree. >>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>> void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, >>>>>>>>>>>>>>> u8 >>>>>>>>>>>>>>> ac) >>>>>>>>>>>>>>> __releases(txq_lock); >>>>>>>>>>>>>>> @@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct >>>>>>>>>>>>>>> ieee80211_hw >>>>>>>>>>>>>>> *hw, struct ieee80211_txq *txq) >>>>>>>>>>>>>>> __acquires(txq_lock) __releases(txq_lock); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /** >>>>>>>>>>>>>>> + * ieee80211_txqs_check - Check txqs waiting for removal >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + * @tmr: pointer as obtained from local >>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>> +void ieee80211_txqs_check(struct timer_list *tmr); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +/** >>>>>>>>>>>>>>> * ieee80211_txq_may_transmit - check whether TXQ is >>>>>>>>>>>>>>> allowed >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> transmit >>>>>>>>>>>>>>> * >>>>>>>>>>>>>>> * This function is used to check whether given txq is >>>>>>>>>>>>>>> allowed >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> transmit by >>>>>>>>>>>>>>> diff --git a/net/mac80211/ieee80211_i.h >>>>>>>>>>>>>>> b/net/mac80211/ieee80211_i.h >>>>>>>>>>>>>>> index a4556f9..49aa143e 100644 >>>>>>>>>>>>>>> --- a/net/mac80211/ieee80211_i.h >>>>>>>>>>>>>>> +++ b/net/mac80211/ieee80211_i.h >>>>>>>>>>>>>>> @@ -847,6 +847,7 @@ struct txq_info { >>>>>>>>>>>>>>> struct codel_stats cstats; >>>>>>>>>>>>>>> struct sk_buff_head frags; >>>>>>>>>>>>>>> struct rb_node schedule_order; >>>>>>>>>>>>>>> + struct list_head candidate; >>>>>>>>>>>>>>> unsigned long flags; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* keep last! */ >>>>>>>>>>>>>>> @@ -1145,6 +1146,8 @@ struct ieee80211_local { >>>>>>>>>>>>>>> u64 airtime_v_t[IEEE80211_NUM_ACS]; >>>>>>>>>>>>>>> u64 airtime_weight_sum[IEEE80211_NUM_ACS]; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + struct list_head remove_list[IEEE80211_NUM_ACS]; >>>>>>>>>>>>>>> + struct timer_list remove_timer; >>>>>>>>>>>>>>> u16 airtime_flags; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> const struct ieee80211_ops *ops; >>>>>>>>>>>>>>> diff --git a/net/mac80211/main.c b/net/mac80211/main.c >>>>>>>>>>>>>>> index e9ffa8e..78fe24a 100644 >>>>>>>>>>>>>>> --- a/net/mac80211/main.c >>>>>>>>>>>>>>> +++ b/net/mac80211/main.c >>>>>>>>>>>>>>> @@ -667,10 +667,15 @@ struct ieee80211_hw >>>>>>>>>>>>>>> *ieee80211_alloc_hw_nm(size_t priv_data_len, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for (i = 0; i < IEEE80211_NUM_ACS; i++) { >>>>>>>>>>>>>>> local->active_txqs[i] = RB_ROOT_CACHED; >>>>>>>>>>>>>>> + INIT_LIST_HEAD(&local->remove_list[i]); >>>>>>>>>>>>>>> spin_lock_init(&local->active_txq_lock[i]); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + timer_setup(&local->remove_timer, ieee80211_txqs_check, >>>>>>>>>>>>>>> 0); >>>>>>>>>>>>>>> + mod_timer(&local->remove_timer, >>>>>>>>>>>>>>> + jiffies + >>>>>>>>>>>>>>> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS)); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> INIT_LIST_HEAD(&local->chanctx_list); >>>>>>>>>>>>>>> mutex_init(&local->chanctx_mtx); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct >>>>>>>>>>>>>>> ieee80211_hw >>>>>>>>>>>>>>> *hw) >>>>>>>>>>>>>>> tasklet_kill(&local->tx_pending_tasklet); >>>>>>>>>>>>>>> tasklet_kill(&local->tasklet); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + del_timer_sync(&local->remove_timer); >>>>>>>>>>>>>>> #ifdef CONFIG_INET >>>>>>>>>>>>>>> unregister_inetaddr_notifier(&local->ifa_notifier); >>>>>>>>>>>>>>> #endif >>>>>>>>>>>>>>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c >>>>>>>>>>>>>>> index d00baaa..42ca010 100644 >>>>>>>>>>>>>>> --- a/net/mac80211/tx.c >>>>>>>>>>>>>>> +++ b/net/mac80211/tx.c >>>>>>>>>>>>>>> @@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct >>>>>>>>>>>>>>> ieee80211_sub_if_data *sdata, >>>>>>>>>>>>>>> codel_stats_init(&txqi->cstats); >>>>>>>>>>>>>>> __skb_queue_head_init(&txqi->frags); >>>>>>>>>>>>>>> RB_CLEAR_NODE(&txqi->schedule_order); >>>>>>>>>>>>>>> + INIT_LIST_HEAD(&txqi->candidate); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> txqi->txq.vif = &sdata->vif; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -3724,6 +3725,9 @@ void ieee80211_schedule_txq(struct >>>>>>>>>>>>>>> ieee80211_hw >>>>>>>>>>>>>>> *hw, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> spin_lock_bh(&local->active_txq_lock[ac]); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + if (!list_empty(&txqi->candidate)) >>>>>>>>>>>>>>> + list_del_init(&txqi->candidate); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> if (!RB_EMPTY_NODE(&txqi->schedule_order)) >>>>>>>>>>>>>>> goto out; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -3783,6 +3787,20 @@ static void >>>>>>>>>>>>>>> __ieee80211_unschedule_txq(struct >>>>>>>>>>>>>>> ieee80211_hw *hw, >>>>>>>>>>>>>>> RB_CLEAR_NODE(&txqi->schedule_order); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +void ieee80211_remove_txq(struct ieee80211_hw *hw, >>>>>>>>>>>>>>> + struct ieee80211_txq *txq) >>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>> + struct ieee80211_local *local = hw_to_local(hw); >>>>>>>>>>>>>>> + struct txq_info *txqi = to_txq_info(txq); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + lockdep_assert_held(&local->active_txq_lock[txq->ac]); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + if (!RB_EMPTY_NODE(&txqi->schedule_order)) { >>>>>>>>>>>>>>> + __ieee80211_unschedule_txq(hw, txq); >>>>>>>>>>>>>>> + list_del_init(&txqi->candidate); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> void ieee80211_unschedule_txq(struct ieee80211_hw *hw, >>>>>>>>>>>>>>> struct ieee80211_txq *txq) >>>>>>>>>>>>>>> __acquires(txq_lock) __releases(txq_lock) >>>>>>>>>>>>>>> @@ -3790,7 +3808,7 @@ void ieee80211_unschedule_txq(struct >>>>>>>>>>>>>>> ieee80211_hw *hw, >>>>>>>>>>>>>>> struct ieee80211_local *local = hw_to_local(hw); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> spin_lock_bh(&local->active_txq_lock[txq->ac]); >>>>>>>>>>>>>>> - __ieee80211_unschedule_txq(hw, txq); >>>>>>>>>>>>>>> + ieee80211_remove_txq(hw, txq); >>>>>>>>>>>>>>> spin_unlock_bh(&local->active_txq_lock[txq->ac]); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -3803,11 +3821,48 @@ void ieee80211_return_txq(struct >>>>>>>>>>>>>>> ieee80211_hw >>>>>>>>>>>>>>> *hw, >>>>>>>>>>>>>>> lockdep_assert_held(&local->active_txq_lock[txq->ac]); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (!RB_EMPTY_NODE(&txqi->schedule_order) && >>>>>>>>>>>>>>> - (skb_queue_empty(&txqi->frags) && >>>>>>>>>>>>>>> !txqi->tin.backlog_packets)) >>>>>>>>>>>>>>> - __ieee80211_unschedule_txq(hw, txq); >>>>>>>>>>>>>>> + !txq_has_queue(&txqi->txq) && >>>>>>>>>>>>>>> + list_empty(&txqi->candidate)) >>>>>>>>>>>>>>> + list_add_tail(&txqi->candidate, >>>>>>>>>>>>>>> &local->remove_list[txq->ac]); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> EXPORT_SYMBOL(ieee80211_return_txq); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +void __ieee80211_check_txqs(struct ieee80211_local >>>>>>>>>>>>>>> *local, >>>>>>>>>>>>>>> int >>>>>>>>>>>>>>> ac) >>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>> + struct txq_info *iter, *tmp; >>>>>>>>>>>>>>> + struct sta_info *sta; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + lockdep_assert_held(&local->active_txq_lock[ac]); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + list_for_each_entry_safe(iter, tmp, >>>>>>>>>>>>>>> &local->remove_list[ac], >>>>>>>>>>>>>>> + candidate) { >>>>>>>>>>>>>>> + sta = container_of(iter->txq.sta, struct sta_info, >>>>>>>>>>>>>>> sta); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + if (txq_has_queue(&iter->txq)) >>>>>>>>>>>>>>> + list_del_init(&iter->candidate); >>>>>>>>>>>>>>> + else >>>>>>>>>>>>>>> + ieee80211_remove_txq(&local->hw, &iter->txq); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +void ieee80211_txqs_check(struct timer_list *t) >>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>> + struct ieee80211_local *local = from_timer(local, t, >>>>>>>>>>>>>>> remove_timer); >>>>>>>>>>>>>>> + struct txq_info *iter, *tmp; >>>>>>>>>>>>>>> + struct sta_info *sta; >>>>>>>>>>>>>>> + int ac; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { >>>>>>>>>>>>>>> + spin_lock_bh(&local->active_txq_lock[ac]); >>>>>>>>>>>>>>> + __ieee80211_check_txqs(local, ac); >>>>>>>>>>>>>>> + spin_unlock_bh(&local->active_txq_lock[ac]); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + mod_timer(&local->remove_timer, >>>>>>>>>>>>>>> + jiffies + >>>>>>>>>>>>>>> msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS)); >>>>>>>>>>>>>>> +} >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'll ask the same as I did last time (where you told me to >>>>>>>>>>>>>> hold >>>>>>>>>>>>>> off >>>>>>>>>>>>>> until this round): >>>>>>>>>>>>>> >>>>>>>>>>>>>> Why do you need the timer and the periodic check? If TXQs >>>>>>>>>>>>>> are >>>>>>>>>>>>>> added >>>>>>>>>>>>>> to >>>>>>>>>>>>>> the remove list during the scheduling run, and >>>>>>>>>>>>>> __ieee80211_check_txqs() >>>>>>>>>>>>>> is run from schedule_end(), isn't that sufficient to clear >>>>>>>>>>>>>> the >>>>>>>>>>>>>> list? >>>>>>>>>>>>> Is it possible that a txq is not added to the remove list >>>>>>>>>>>>> but >>>>>>>>>>>>> then >>>>>>>>>>>>> packets in it are dropped by fq_codel algo? Like the station >>>>>>>>>>>>> disconnects >>>>>>>>>>>>> without any notification. >>>>>>>>>>>> >>>>>>>>>>>> Well as long as all the other cleanup paths call directly >>>>>>>>>>>> into >>>>>>>>>>>> __unschedule_txq(), that should remove stations from the >>>>>>>>>>>> scheduler >>>>>>>>>>>> when >>>>>>>>>>>> they disconnect etc. >>>>>>>>>>> Yes, the disconnect scenario is a bad example. My concern is, >>>>>>>>>>> say, >>>>>>>>>>> we >>>>>>>>>>> have 10 stations and only one of them is assigned a very small >>>>>>>>>>> weight >>>>>>>>>>> compared with that of others. Suppose, after its chance of Tx, >>>>>>>>>>> it >>>>>>>>>>> is >>>>>>>>>>> most likely to be placed in the rightmost(still has some >>>>>>>>>>> packets >>>>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>> txq) and no more incoming data for it. The remaining packets >>>>>>>>>>> in >>>>>>>>>>> txq >>>>>>>>>>> will >>>>>>>>>>> be dropped due to timeout algo in codel(correct me if I am >>>>>>>>>>> wrong) >>>>>>>>>>> but >>>>>>>>>>> this empty txq will stay on the rbtree until other txqs get >>>>>>>>>>> drained >>>>>>>>>>> or >>>>>>>>>>> global vt catch up with its vt. The staying time could be long >>>>>>>>>>> if >>>>>>>>>>> weight >>>>>>>>>>> is extremely small. Then do we need timer to check or any >>>>>>>>>>> other >>>>>>>>>>> better >>>>>>>>>>> solution? >>>>>>>>>> >>>>>>>>>> Ah, I see what you mean. No, I don't think this will be a >>>>>>>>>> problem; >>>>>>>>>> the >>>>>>>>>> scenario you're describing would play out like this: >>>>>>>>>> >>>>>>>>>> 1. Station ends transmitting, still has a single packet queued, >>>>>>>>>> gets >>>>>>>>>> moved to the end of the rbtree (and stays there for a >>>>>>>>>> while). >>>>>>>>>> >>>>>>>>>> 2. When we finally get to the point where this station gets >>>>>>>>>> another >>>>>>>>>> chance to transmit, the CoDel drop timer triggers and the >>>>>>>>>> last >>>>>>>>>> packet >>>>>>>>>> is dropped[0]. This means that the queue will just be empty >>>>>>>>>> (and ieee80211_tx_dequeue() will return NULL). >>>>>>>>>> >>>>>>>>>> 3. Because the queue is empty, ieee80211_return_txq() will not >>>>>>>>>> put >>>>>>>>>> it >>>>>>>>>> back on the rbtree. >>>>>>>>>> >>>>>>>>>> Crucially, in 2. the CoDel algorithm doesn't kick in until the >>>>>>>>>> point >>>>>>>>>> of >>>>>>>>>> packet dequeue. But even if an empty queue stays on the rbtree >>>>>>>>>> for >>>>>>>>>> a >>>>>>>>>> while, there is no harm in that: eventually it will get its >>>>>>>>>> turn, >>>>>>>>>> it >>>>>>>>>> will turn out to be empty, and just be skipped over. >>>>>>>>> Then that will be fine. Thanks for the explanation of the >>>>>>>>> dropping >>>>>>>>> part >>>>>>>>> in CoDel algorithm. >>>>>>>> >>>>>>>> Yup, think so. And you're welcome :) >>>>>>>> >>>>>>>>>> The issue we need to be concerned about is the opposite: If we >>>>>>>>>> have >>>>>>>>>> a >>>>>>>>>> queue that *does* have packets queued, but which is *not* >>>>>>>>>> scheduled >>>>>>>>>> for >>>>>>>>>> transmission, that will stall TX. >>>>>>>>> Is it by design since its vt is more than global vt, right? The >>>>>>>>> lattency >>>>>>>>> may somehow get impacted though. >>>>>>>> >>>>>>>> Well, it should still stay on the rbtree as long as it has >>>>>>>> packets >>>>>>>> queued. We don't have a check anywhere that reschedules TXQs >>>>>>>> whose >>>>>>>> v_t >>>>>>>> drops below global v_t... >>>>>>>> >>>>>>>>>> [0] CoDel in most cases only drops a single packet at a time, >>>>>>>>>> so >>>>>>>>>> it >>>>>>>>>> will >>>>>>>>>> not clear out an entire queue with multiple packets in one go. >>>>>>>>>> But >>>>>>>>>> you >>>>>>>>>> are right that it could conceivably drop the last packet in a >>>>>>>>>> queue. >>>>>>>>>> >>>>>>>>>>>> We only need to defer removal inside a single "scheduling >>>>>>>>>>>> round" >>>>>>>>>>>> (i.e., >>>>>>>>>>>> between a pair of ieee80211_txq_schedule_start/end. So if we >>>>>>>>>>>> just >>>>>>>>>>>> walk >>>>>>>>>>>> the remove list in schedule_end() we should be enough, no? >>>>>>>>>>>> >>>>>>>>>>>> Hmm, or maybe a simpler way to fix the original issue is just >>>>>>>>>>>> to >>>>>>>>>>>> have >>>>>>>>>>>> unschedule_txq() update the schedule_pos() pointer? >>>>>>>>>>>> >>>>>>>>>>>> I.e., unschedule_txq checks if the txq being removed is >>>>>>>>>>>> currently >>>>>>>>>>>> being >>>>>>>>>>>> pointed to by schedule_pos[ac], and if it is, it updates >>>>>>>>>>>> schedule_pos >>>>>>>>>>>> to >>>>>>>>>>>> be the rb_next of the current value? >>>>>>>>>>> Actually, if schedule_pos is updated to rb_next of the current >>>>>>>>>>> value, >>>>>>>>>>> then in the next_txq() where we are going to use rb_next again >>>>>>>>>>> and >>>>>>>>>>> finally pick the next node of the node we really want. Is it >>>>>>>>>>> fine >>>>>>>>>>> to >>>>>>>>>>> update schedule_pos to NULL? >>>>>>>>>> >>>>>>>>>> Hmm, yeah, good point. >>>>>>>>>> >>>>>>>>>> If we do end up setting schedule_pos to NULL in the middle of a >>>>>>>>>> scheduling round, that will make next_txq() "start over", and >>>>>>>>>> do >>>>>>>>>> another >>>>>>>>>> loop through the whole thing. I guess we may be able hit a case >>>>>>>>>> where >>>>>>>>>> things can oscillate back and forth between addition and >>>>>>>>>> removal >>>>>>>>>> resulting in an infinite loop? Not sure, but at least I can't >>>>>>>>>> seem >>>>>>>>>> to >>>>>>>>>> convince myself that this can't happen. >>>>>>>>> >>>>>>>>> As the loop of next_txq under lock protection as below, >>>>>>>>> >>>>>>>>> txq_schedule_start(); >>>>>>>>> while(txq=next_txq()){ >>>>>>>>> ... >>>>>>>>> return_txq(txq); >>>>>>>>> } >>>>>>>>> txq_schedule_end(); >>>>>>>>> >>>>>>>>> I do not see any chance of addition, no? >>>>>>>> >>>>>>>> As you noted in your other email, Felix reduced the locking. And >>>>>>>> yeah, >>>>>>>> we need to rebase this series to also incorporate that. I figure >>>>>>>> I >>>>>>>> can >>>>>>>> send an updated version of the first patch in the series once >>>>>>>> we've >>>>>>>> worked out the remaining issues with your follow-up patches. >>>>>>>> >>>>>>> Oh, I was thinking we were discussing without locking reduced. >>>>>>> Yes, >>>>>>> I >>>>>>> also agree there might be a case causing infinite loop. With >>>>>>> locking >>>>>>> reduced, the tree can be adjusted between next_txq() and >>>>>>> return_txq() >>>>>>> in >>>>>>> the loop situation. For further discussion, let 's consider, >>>>>>> 1) the tree starts like: >>>>>>> A->B->C->D->E >>>>>>> 2) then next_txq() returns A for dequeuing >>>>>>> 3) driver dequeues A and draines A without any active txq locked >>>>>>> meaning >>>>>>> the tree could be changed upon Tx compeletion. >>>>>>> 4) then in return_txq(), the tree could be, >>>>>>> i A->B->C->D->E (A is empty, and maybe soon be added >>>>>>> back >>>>>>> before the loop end) >>>>>>> ii B->C->A->D->E (A is empty, and maybe soon be added >>>>>>> back >>>>>>> before the loop end) >>>>>>> iii B->C->D->E->A (A is empty, and maybe soon be added >>>>>>> back >>>>>>> before the loop end) >>>>>>> >>>>>>> with this change: >>>>>>> local->schedule_pos[ac] = rb_next(node) ?: rb_prev(node); >>>>>>> >>>>>>> for case i, local->schedule_pos[ac] is rb_next(A) which is B, and >>>>>>> in >>>>>>> next_txq(), rb_next(B) is what we returns which actually is C and >>>>>>> B >>>>>>> is >>>>>>> skipped, no? >>>>>>> >>>>>>> Similiar for case ii, we skip B, C, D. >>>>>> >>>>>> Yup, I think you're right. But if we can fix this by making >>>>>> ieee80211_resort_txq() aware of the schedule_pos as well, no? I.e., >>>>>> if >>>>>> resort_txq() acts on the txq that's currently in schedule_pos, it >>>>>> will >>>>>> update schedule pos with the same rb_next(node) ?: rb_prev(node); >>>>>> (optionally after checking that the position of the node is >>>>>> actually >>>>>> going to change). >>>>> Sorry, please igore last email sent by mistake. >>>>> >>>>> I don't think it makes any difference with that in unschedule_txq(). >>>>> For >>>>> case i, it finally picks C as well in next_txq(). For next_txq(), >>>>> schedule_pos means previous candidate node whereas with your change, >>>>> it >>>>> looks like schedule_pos is current candidate node instead. >>>> >>>> Hmm, that was not actually what I was thinking, but yeah I think >>>> you're >>>> right that it would be easier to just change it so schedule_pos is >>>> pointing to the next and not the current txq we want to schedule. >>> So do you mean we can change next_txq like this, >>> >>> struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 >>> ac) >>> { >>> struct ieee80211_local *local = hw_to_local(hw); >>> struct rb_node *node = local->schedule_pos[ac]; >>> struct txq_info *txqi = NULL; >>> bool first = false; >>> >>> lockdep_assert_held(&local->active_txq_lock[ac]); >>> >>> if (!node) { >>> node = rb_first_cached(&local->active_txqs[ac]); >>> first = true; >>> - } else >>> - node = rb_next(node); >>> + } >>> + >>> if (!node) >>> return NULL; >> >> Ah, no, now I remember why this didn't work and I went with the other >> approach: If you make this change, you also have to have this at the >> end: >> >> local->schedule_pos[ac] = rb_next(node); >> >> >> But this means we can no longer distinguish between having gone through >> the whole thing (so rb_next() returns NULL), or starting out with >> nothing. >> >> So, instead we need to keep next_txq() the way it is, and just add > > Right, should keep next_txq() the way it is. > >> >> local->schedule_pos[ac] = rb_prev(node); >> >> whenever we remove a node (both in return_txq() and resort_txq()). > > Agree, and also we may need to consider case like A is removed and soon > be added back just the same as ii), > B->C->A->D->E > then B is schedule, removed and soon added back, > C->A->B->D->E > A and B will have a second chance to be scheduled and this may happen to > others as well leading to the infinite loop as you have mentioned > previously, so do we need to maintain a schedule_round like we do in > DRR? Like, > - If the node is in the same round, by pass schedule, go to > rb_next(), either continue loop this round or end this round. > - Increase the schedule_round at the schedule_start() only when the > schedule_pos is NULL. Hmm, yeah, I guess we could end up with a loop like that as well. Keeping the schedule_round would be a way to fix it, but I'm not sure we should just skip that station; maybe we should just end the round instead? >>>> We'd still need a check in resort_txq() then, but it would make it >>>> safe >>>> to unschedule in return_txq()... >>> Yes, agree with that. >>> >>> >>>> >>>>>>> Also I am wondering if there will be some SMP issues relating with >>>>>>> local->schedule_pos[ac]. >>>>>> >>>>>> Not sure what you mean by this? >>>>> My bad. Please ignore this. >>>>> >>>>> >>>>>> >>>>>>>>> In ath10k, we will usually push packets of first txq as many as >>>>>>>>> we >>>>>>>>> can >>>>>>>>> until it is drained and then move to the next one. So if a txq >>>>>>>>> gets >>>>>>>>> removed in the return_txq, it should always be the leftmost. And >>>>>>>>> during this period, neither vt of any station or global vt can >>>>>>>>> be >>>>>>>>> updated due to lock protection. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> But in that case, we could fix it by just conditionally >>>>>>>>>> assigning >>>>>>>>>> either >>>>>>>>>> rb_next or rb_prev to the schedule_pos in unschedule_txq()? >>>>>>>>>> I.e., >>>>>>>>>> something like: >>>>>>>>>> >>>>>>>>>> local->schedule_pos[ac] = rb_next(node) ?: rb_prev(node); >>>>>>>>> I am not sure I am getting your point. Still in next_txq, >>>>>>>>> schedule_pos[ac] will lead us to the next node of the one we >>>>>>>>> want. >>>>>>>> >>>>>>>> The logic in next_txq is different when schedule_pos[ac] is NULL, >>>>>>>> vs >>>>>>>> when rb_next(schedule_pos[ac]) is NULL. The former restarts a new >>>>>>>> scheduling round, while the latter ends the current round. >>>>>>>> >>>>>>>> -Toke >>>>>>> >>>>>>> -- >>>>>>> Yibo >>>>> >>>>> -- >>>>> Yibo >>> >>> -- >>> Yibo > > -- > Yibo