Felix Fietkau <nbd@xxxxxxxx> writes: >>> I tried to avoid it, but I didn't find a better way to do it. I added it >>> in order to define the number of levels for the skiplist only once and >>> make the code resolve it for the individual functions at compile time in >>> a type safe way. >>> Other data structures in the kernel don't need this, because their >>> member node struct size typically doesn't change based on a given parameter. >> >> Well, only way I can think of would be doing something like: >> >> #define MAX_SKIPLIST_LEVELS 5 >> struct skiplist_node { >> struct skiplist_node *next[MAX_SKIPLIST_LEVELS]; >> }; >> >> And then having the init function take the actual number of levels for >> that instance and store it in the _list struct. Then, if someone wants >> to use the skiplist with more levels at some later point, they'd bump >> the MAX_SKIPLIST_LEVELS define and use that for their initialiser. > That's exactly what I wanted to avoid. For lists with lots of elements > you might need a lot more levels, which would add a significant amount > of bloat for small lists. Right, fair enough, let's keep the macro thing, then. I have an idea that I'd like to test out for a skiplist-based BPF map type which would require runtime sizing, but I can probably fudge that at least until I've figured out if it's feasible, then we can revisit if needed. >>>> So this implies that we always schedule in airtime order (i.e., enforce >>>> fairness) for any driver that can get a meaningful value returned from >>>> ieee80211_calc_expected_tx_airtime(), right? That's probably OK, but >>>> just want to make sure we've thought through all the implications of >>>> this. >>>> >>>> A comment here explaining why this is done would be useful; it's a bit >>>> counter-intuitive when just looking at the code. Your comment in the >>>> commit message implies that scheduling doesn't work correctly if this is >>>> not done, but then what happens if airtime is 0 and we bail out above? >>> I guess I need to add something to deal with that corner case, maybe by >>> returning the smallest possible value for expected tx airtime if it >>> can't be calculated. >> >> I think there are two approaches here: >> >> - Make sure we always return at least '1' for airtime, so the counter >> always increases on every schedule (effectively making it a frame >> counter instead). >> >> - Make sure the skiplist does the right thing if all entries enqueued >> have the same (0) value (i.e., that it degrades to round-robin). >> >> I think the skiplist does in fact degrade to round-robin since it treats >> equality the same as "greater than", but it may be somewhat inefficient >> when inserting in this case? Or? > The delete logic in my skiplist code can't properly deal with multiple > items having the same value. Adding that would likely make the > implementation somewhat more complex, so I decided to sidestep it by > making it use an internal wrapper for the comparison function which > compares the pointer address in case of equal value. Ah, right, missed that bit; was only looking at insert. Okay, but even if you're returning a fixed minimum value for stations with no real airtime estimation, that would still result in multiple entries with the same value with a high probability. Hmm, how about fixing it by adding an "insert epoch" to the skiplist_node? I.e., on insert, walk all the nodes with the same cmp() value and set this field to the highest value? Like cmp(a, b) { cmpval = __cmpfunc(a, b); if (!cmpval) return a->epoch - b->epoch; } insert() { while (!cmp(node, list[i])) { i++; node->epoch++; } } (doesn't quite work, but hopefully you get the gist of it? epoch should be reset on dequeue of course). Something like this would preserve equal-value elements in the same order across insert-remove, at the cost of making insert less efficient (I think?) if many items share the same value. Maybe that's acceptable? >>>> How can this happen in normal operation? This implies that a TXQ was >>>> scheduled without a backlog; isn't that a bug that we should warn on? >>> At least mt76 assumes that calling ieee80211_next_txq in a loop during a >>> scheduling round will eventually return NULL, even when no frames were >>> queued. ath9k could potentially also need this, depending on the block >>> ack window state. >>> This assumption was valid for the previous implementation, and I figured >>> it would be cleaner and more robust to preserve it. >> >> Sure, I think we should; but doesn't that already happen above, at the >> 'if (!node)' check? I.e., if the skiplist is empty we'll return NULL. > In the state that I'm talking about, the skiplist won't be empty. I'm > assuming that the first txq in the list has some queued packets, has not > used its AQL budget yet, but the driver still can't pull more packets > from it because of driver specific conditions. In that case, it will run > a loop where ieee80211_next_txq returns the same queue over and over > again, and the driver returns it each time (which causes it to get added > back to the list). But that's what the check against schedule_pos is for, right? I.e., making sure that we return NULL the second time we encounter the same node in a scheduling round. What I'm talking about is the last condition check in this while loop: + while (1) { + txqi = to_txq_info(txq); + if (test_and_clear_bit(IEEE80211_TXQ_FORCE_ACTIVE, &txqi->flags)) + break; + + if (txq_has_queue(txq)) + break; + + airtime_info_next_txq_idx(air_info); + txq = air_info->txq[air_info->txq_idx]; + if (txq_idx == air_info->txq_idx) + goto begin; + } (the 'goto begin'). That will only get hit if we've looped through all the TXQs in an air_info, without finding any that has either the force bit or a packet queued. But if we agreed (below) that the logic in ieee80211_return_txq() is: if (force || (txq_has_queue(txq) && ieee80211_txq_airtime_check(hw, &txqi->txq))) __ieee80211_insert_txq(local, air_sched, txqi); else __ieee80211_unschedule_txq(hw, txq, false); ... how does an air_info end up in the skiplist without having either the force bit or a backlog in at least one TXQ? >>>> This sets the bit even if the AQL check fails below; is that intentional? >>> Yes. The bit indicates that the queue should be passed to the driver >>> even when mac80211 has no frames queued for it (presumably because the >>> driver has queued some internally). >> >> But it won't actually be inserted into the rotation if the AQL check >> fails (because then the 'else if' check of 'force' won't happen)? >> >> I think the right logic would be: >> >> if (force || (txq_has_queue(txq) && ieee80211_txq_airtime_check(hw, &txqi->txq))) >> __ieee80211_insert_txq(local, air_sched, txqi); >> else >> __ieee80211_unschedule_txq(hw, txq, false); >> >> no? > I thought about it some more, and I think you're right. I was not > putting it back into rotation under the assumption that pulled but not > fully enqueued packets in the driver don't take up a significant chunk > of the AQL budget, but that assumption may not always be true, depending > on the driver approach and the tx rate (especially with CCK). Yup, also the AQL limits are user-configurable, so it's probably best not to make any assumptions about their magnitude... >>>> I get the intention behind this, and think it's (probably) reasonable. >>>> But testing it with an actual ath10k device in push/pull mode would be >>>> good :) >>> I'm not very familiar with ath10k, could you help me with that? >> >> Sadly I don't have a good handle on the ath10k either, and there are >> already issues with the current virtual time-based code in ath10k >> push/pull-mode. There is some discussion on the openwrt forum[0], which >> indicates that this patch doesn't help with the issue either. >> >> Unless someone shows up with the time, motivation and hardware to >> properly fix this, I think the most sensible thing to do may be to just >> turn the whole of ieee80211_txq_may_transmit() into a wrapper around >> ieee80211_txq_airtime_check() (i.e., do the AQL check, don't bother with >> fairness). WDYT? > I agree. My guess is that proper fairness is probably incompatible with > push-pull mode behavior of the firmware anyway. Yeah, I suspect the reason the old code (back when we did round-robin scheduling) "works" is that it just keeps looping until the TXQ is eligible, so it's not enforcing fairness anyway... -Toke