On 12/5/19 8:34 AM, Johannes Berg wrote:
Hi Toke, all, I'm debugging some throughput issues and wondered if you had a hint. This is at HE rates 2x2 80 MHz, so you'd expect ~1Gbps or a bit more, I'm getting ~900 Mbps. Just to set the stage. What I think is (part of) the problem is that I see in the logs that our hardware queues become empty every once a while. This seems to be when/because ieee80211_tx_dequeue() returns NULL, and we hit the skb = ieee80211_tx_dequeue(hw, txq); if (!skb) { if (txq->sta) IWL_DEBUG_TX(mvm, "TXQ of sta %pM tid %d is now empty\n", txq->sta->addr, txq->tid); printout, e.g. iwlwifi 0000:00:14.3: I iwl_mvm_mac_itxq_xmit TXQ of sta 0c:9d:92:03:12:44 tid 0 is now empty This isn't always bad, but in most cases I see it happen the hardware queue actually is rather shallow at the time, say only 57 packets in some instance. Then we can basically send all the packets in the queue in one or two aggregations (see I here an example with 57 packets in the queue, ieee80211_tx_dequeue() returns NULL, and we then send an A-MPDU with 38 followed by one with 19 packets, making the HW queue empty.) This is with 10 simultaneous TCP streams, so there *shouldn't* be any issues with that, I did indeed try to lower the pacing shift and it had no effect. I couldn't try with just one or two streams (actually one stream is not enough because the AP has only GBit LAN ... so in the ideal case wireless is faster than ethernet!!) - somehow the test hangs then, but I'll get back to that later. Anyhow, do you have any tips on debugging this? This is still without AQL code. The AQM stats for the AP look fine, basically everything is 0 except for "new-flows", "tx-bytes" and "tx-packets". One thing that does seem odd is that the new-flows counter is increasing this rapidly - shouldn't we expect it to be like 10 new flows for 10 TCP sessions? I see this counter increase by the thousands per second. I don't see any calls to __ieee80211_stop_queue() either, as expected (per trace-cmd). CPU load is not an issue AFAICT, even with all the debugging being written into the syslog (or journal or something) that's the only thing that takes noticable CPU time - ~50% for systemd-journal and ~20% for rsyslogd, <10% for the throughput testing program and that's about it. The system has 4 threads and seems mostly idle. All this seems to mean that the TCP stack isn't feeding us fast enough, but is that really possible?
Does UDP work better? or pktgen? Thanks, Ben
Any other ideas? Thanks, johannes
-- Ben Greear <greearb@xxxxxxxxxxxxxxx> Candela Technologies Inc http://www.candelatech.com