Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: > On 04/28/2020 09:27 AM, Dave Taht wrote: >> On Tue, Apr 28, 2020 at 7:56 AM Steve deRosier <derosier@xxxxxxxxx> wrote: >>> >>> On Mon, Apr 27, 2020 at 7:54 AM <greearb@xxxxxxxxxxxxxxx> wrote: >>>> >>>> From: Ben Greear <greearb@xxxxxxxxxxxxxxx> >>>> >>>> While running tcp upload + download tests with ~200 >>>> concurrent TCP streams, 1-2 processes, and 30 station >>>> vdevs, I noticed that the __ieee80211_stop_queue was taking >>>> around 20% of the CPU according to perf-top, which other locking >>>> taking an additional ~15%. >>>> >>>> I believe the issue is that the ath10k driver would unlock the >>>> txqueue when a single frame could be transmitted, instead of >>>> waiting for a low water mark. >>>> >>>> So, this patch adds a low-water mark that is 1/4 of the total >>>> tx buffers allowed. >>>> >>>> This appears to resolve the performance problem that I saw. >>>> >>>> Tested with recent wave-1 ath10k-ct firmware. >>>> >>> >>> Hey Ben, >>> >>> Did you do any testing with this patch around latency? The nature of >>> the thing that you fixed makes me wonder if it was intentional with >>> respect to making WiFi fast - ie getting rid of buffers as much as >>> possible. Obviously the CPU impact is likely to be an unintended >>> consequence. In any case, I don't know anything for sure, it was just >>> a thought that went through my head when reading this. >> >> I note that I'd prefer a BQL-like high/low watermark approach in >> general... bytes, not packets, or better, perceived >> airtime in a revision of AQL... >> >> ... but we'll try this patch, thx! > > Is there a nice diagram somewhere that shows where the various > buffer-bloat technologies sit in the stack? Not really. Best thing I know of is the one I drew myself: Figure 3 of this paper: https://www.usenix.org/system/files/conference/atc17/atc17-hoiland-jorgensen.pdf That is still a semi-accurate representation of the queueing structure in mac80211. For ath10k, just imagine that the bit that says "ath9k driver" is part of mac80211, and that the "HW queue" is everything the driver and firmware does... AQL also activates in the circle labelled "RR" there, but the figure predates AQL. > I suspect such should be above the txqueue start/stop logic in the > driver that I mucked with, and certainly the old behaviour has no > obvious tie-in with any higher-level algorithms. > > I doubt my patch will change much except in pathological cases where > the system is transmitting frames fast enough to fully fill the tx > buffers and also loaded in such a way that it can process just a few > tx frames at a time to keep bouncing to full and not full over and > over. The latter part of that ("can process just a few tx frames at a time") mostly happens when many stations are active at the same time, right? -Toke