On Tue, Apr 28, 2020 at 1:58 PM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote: > > Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: > > > On 04/28/2020 12:39 PM, Dave Taht wrote: > >> On Tue, Apr 28, 2020 at 12:37 PM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote: > >>> > >>> Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: > >>> > >>>> On 04/28/2020 07:56 AM, Steve deRosier wrote: > >>>>> On Mon, Apr 27, 2020 at 7:54 AM <greearb@xxxxxxxxxxxxxxx> wrote: > >>>>>> > >>>>>> From: Ben Greear <greearb@xxxxxxxxxxxxxxx> > >>>>>> > >>>>>> While running tcp upload + download tests with ~200 > >>>>>> concurrent TCP streams, 1-2 processes, and 30 station > >>>>>> vdevs, I noticed that the __ieee80211_stop_queue was taking > >>>>>> around 20% of the CPU according to perf-top, which other locking > >>>>>> taking an additional ~15%. > >>>>>> > >>>>>> I believe the issue is that the ath10k driver would unlock the > >>>>>> txqueue when a single frame could be transmitted, instead of > >>>>>> waiting for a low water mark. > >>>>>> > >>>>>> So, this patch adds a low-water mark that is 1/4 of the total > >>>>>> tx buffers allowed. > >>>>>> > >>>>>> This appears to resolve the performance problem that I saw. > >>>>>> > >>>>>> Tested with recent wave-1 ath10k-ct firmware. > >>>>>> > >>>>> > >>>>> Hey Ben, > >>>>> > >>>>> Did you do any testing with this patch around latency? The nature of > >>>>> the thing that you fixed makes me wonder if it was intentional with > >>>>> respect to making WiFi fast - ie getting rid of buffers as much as > >>>>> possible. Obviously the CPU impact is likely to be an unintended > >>>>> consequence. In any case, I don't know anything for sure, it was just > >>>>> a thought that went through my head when reading this. > >>>> > >>>> I did not, but on average my patch should make the queues be less full, > >>>> so I doubt it will hurt latency. > >>> > >>> I would tend to agree with that. > >> > >> Well, I don't, as it's dependent on right sizing the ring in the first place. > > > > My patch, barring strange issues elsewhere, can only make the firmware tx queues less full on > > average. > > > > If you want to test with different ring sizes, you can play with the tx_desc > > setting in the ath10k-ct driver 'fwcfg' options. > > > > http://www.candelatech.com/ath10k-10.4.php#config > > > > My testing shows that overall throughput goes down when using lots of peers > > if you have smaller numbers of txbuffers. This is because the firmware > > will typically spread its buffers over lots of peers and have smaller ampdu > > chains per transmit. An upper stack that more intelligently fed frames > > to the firmware could mitigate this, and it is not all bad anyway since > > giving everyone a 64 ampdu chains will increase burstiness at least > > somewhat. > > Making each transmission shorter is arguably the right thing to do in > the "extremely congested" scenario, though. If you have to wait your > turn behind 100 other stations for your next TXOP you'd generally want > each of those other stations to only transmit (say) 1ms instead of their > full 4ms. Yes, this will hurt aggregate throughput somewhat, but I'd > argue that in most cases the overall application performance would be > better. You're right, though, that ideally this would be managed a bit > smarter than by just running out of buffers :) > > > I've always envisioned that the stuff you and Toke and others have been > > working on would help in this area, but I don't understand your stuff well > > enough to know if that is true or not. > > It might, although as per above I'm not quite sure what "helps" really > means in this context. What would you expect a good behaviour to be > here? I think what you're alluding to is to limit the total number of > stations that will be allowed to have outstanding data in the firmware > at the same time, right? Here it would help a bit to know some more > details of how the firmware manages its internal queueing, and how it > schedules stations (if at all)? > > BTW, are you running any of these tests with AQL enabled? > I don't know if Ben is doing so, but I will be doing so here very soon. I've got some unrelated things in the way to clear up first, but within a couple of weeks we hope to be testing AQL with ath10k-ct firmware and driver. And everyone thanks for the discussion, it's been very interesting and useful to me. Hopefully we can improve ath10k even more with this information. - Steve