On 04/28/2020 12:39 PM, Dave Taht wrote:
On Tue, Apr 28, 2020 at 12:37 PM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote:
Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
On 04/28/2020 07:56 AM, Steve deRosier wrote:
On Mon, Apr 27, 2020 at 7:54 AM <greearb@xxxxxxxxxxxxxxx> wrote:
From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
While running tcp upload + download tests with ~200
concurrent TCP streams, 1-2 processes, and 30 station
vdevs, I noticed that the __ieee80211_stop_queue was taking
around 20% of the CPU according to perf-top, which other locking
taking an additional ~15%.
I believe the issue is that the ath10k driver would unlock the
txqueue when a single frame could be transmitted, instead of
waiting for a low water mark.
So, this patch adds a low-water mark that is 1/4 of the total
tx buffers allowed.
This appears to resolve the performance problem that I saw.
Tested with recent wave-1 ath10k-ct firmware.
Hey Ben,
Did you do any testing with this patch around latency? The nature of
the thing that you fixed makes me wonder if it was intentional with
respect to making WiFi fast - ie getting rid of buffers as much as
possible. Obviously the CPU impact is likely to be an unintended
consequence. In any case, I don't know anything for sure, it was just
a thought that went through my head when reading this.
I did not, but on average my patch should make the queues be less full,
so I doubt it will hurt latency.
I would tend to agree with that.
Well, I don't, as it's dependent on right sizing the ring in the first place.
My patch, barring strange issues elsewhere, can only make the firmware tx queues less full on
average.
If you want to test with different ring sizes, you can play with the tx_desc
setting in the ath10k-ct driver 'fwcfg' options.
http://www.candelatech.com/ath10k-10.4.php#config
My testing shows that overall throughput goes down when using lots of peers
if you have smaller numbers of txbuffers. This is because the firmware
will typically spread its buffers over lots of peers and have smaller ampdu
chains per transmit. An upper stack that more intelligently fed frames
to the firmware could mitigate this, and it is not all bad anyway since
giving everyone a 64 ampdu chains will increase burstiness at least somewhat.
I've always envisioned that the stuff you and Toke and others have been
working on would help in this area, but I don't understand your stuff well
enough to know if that is true or not.
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com