On Tue, Apr 28, 2020 at 9:35 AM Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote: > > > > On 04/28/2020 09:27 AM, Dave Taht wrote: > > On Tue, Apr 28, 2020 at 7:56 AM Steve deRosier <derosier@xxxxxxxxx> wrote: > >> > >> On Mon, Apr 27, 2020 at 7:54 AM <greearb@xxxxxxxxxxxxxxx> wrote: > >>> > >>> From: Ben Greear <greearb@xxxxxxxxxxxxxxx> > >>> > >>> While running tcp upload + download tests with ~200 > >>> concurrent TCP streams, 1-2 processes, and 30 station > >>> vdevs, I noticed that the __ieee80211_stop_queue was taking > >>> around 20% of the CPU according to perf-top, which other locking > >>> taking an additional ~15%. > >>> > >>> I believe the issue is that the ath10k driver would unlock the > >>> txqueue when a single frame could be transmitted, instead of > >>> waiting for a low water mark. > >>> > >>> So, this patch adds a low-water mark that is 1/4 of the total > >>> tx buffers allowed. > >>> > >>> This appears to resolve the performance problem that I saw. > >>> > >>> Tested with recent wave-1 ath10k-ct firmware. > >>> > >> > >> Hey Ben, > >> > >> Did you do any testing with this patch around latency? The nature of > >> the thing that you fixed makes me wonder if it was intentional with > >> respect to making WiFi fast - ie getting rid of buffers as much as > >> possible. Obviously the CPU impact is likely to be an unintended > >> consequence. In any case, I don't know anything for sure, it was just > >> a thought that went through my head when reading this. > > > > I note that I'd prefer a BQL-like high/low watermark approach in > > general... bytes, not packets, or better, perceived > > airtime in a revision of AQL... > > > > ... but we'll try this patch, thx! > > Is there a nice diagram somewhere that shows where the various > buffer-bloat technologies sit in the stack? I suspect such should > be above the txqueue start/stop logic in the driver that I mucked > with, and certainly the old behaviour has no obvious tie-in with > any higher-level algorithms. There are some good diagrams of the new queue stuff buried in toke's book and online papers, notably "ending the anomaly" https://bufferbloat-and-beyond.net/ Plug: They just did a print run, and it makes for good bathroom reading. There's also a preso on it around here somewhere. That said, let's see... there's some rants in this: http://flent-fremont.bufferbloat.net/~d/broadcom_aug9.pdf and here ... https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf but that's mostly about what was wrong at the time. Definitely! revising this piece would be a good idea in light of modern developments and increased knowledge. https://www.linuxjournal.com/content/queueing-linux-network-stack IMHO "how to use AQL" is underdocumented at the moment. I'd hoped to produce some after we successfully got the iwl drivers to use it, but we haven't got around to it, and merely getting the ath10k using it (really really) well, has eaten into my ax200 time..... > > I doubt my patch will change much except in pathological cases where > the system is transmitting frames fast enough to fully fill the tx buffers > and also loaded in such a way that it can process just a few tx frames > at a time to keep bouncing to full and not full over and over. > > Thanks, > Ben > > -- > Ben Greear <greearb@xxxxxxxxxxxxxxx> > Candela Technologies Inc http://www.candelatech.com -- Make Music, Not War Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-435-0729