Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: > On 2/21/19 8:10 AM, Kalle Valo wrote: >> Toke Høiland-Jørgensen <toke@xxxxxxx> writes: >> >>> Grant Grundler <grundler@xxxxxxxxxx> writes: >>> >>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote: >>>>> >>>>> Grant Grundler <grundler@xxxxxxxxxx> writes: >>>>> >>>>>>> And, well, Grant's data is from a single test in a noisy >>>>>>> environment where the time series graph shows that throughput is all over >>>>>>> the place for the duration of the test; so it's hard to draw solid >>>>>>> conclusions from (for instance, for the 5-stream test, the average >>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7 >>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware >>>>>>> used in this test, so I can't go verify it myself; so the only thing I >>>>>>> can do is grumble about it here... :) >>>>>> >>>>>> It's a fair complaint and I agree with it. My counter argument is the >>>>>> opposite is true too: most ideal benchmarks don't measure what most >>>>>> users see. While the data wgong provided are way more noisy than I >>>>>> like, my overall "confidence" in the "conclusion" I offered is still >>>>>> positive. >>>>> >>>>> Right. I guess I would just prefer a slightly more comprehensive >>>>> evaluation to base a 4x increase in buffer size on... >>>> >>>> Kalle, is this why you didn't accept this patch? Other reasons? >>>> >>>> Toke, what else would you like to see evaluated? >>>> >>>> I generally want to see three things measured when "benchmarking" >>>> technologies: throughput, latency, cpu utilization >>>> We've covered those three I think "reasonably". >>> >>> Hmm, going back and looking at this (I'd completely forgotten about this >>> patch), I think I had two main concerns: >>> >>> 1. What happens in a degraded signal situation, where the throughput is >>> limited by the signal conditions, or by contention with other devices. >>> Both of these happen regularly, and I worry that latency will be >>> badly affected under those conditions. >>> >>> 2. What happens with old hardware that has worse buffer management in >>> the driver->firmware path (especially drivers without push/pull mode >>> support)? For these, the lower-level queueing structure is less >>> effective at controlling queueing latency. >> >> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377 >> PCI devices, which IIRC do not even support push/pull mode. All the >> rest, including QCA988X and QCA9984 are unaffected. > > Just as a note, at least kernels such as 4.14.whatever perform poorly when > running ath10k on 9984 when acting as TCP endpoints. This makes them not > really usable for stuff like serving video to lots of clients. > > Tweaking TCP (I do it a bit differently, but either way) can significantly > improve performance. Differently how? Did you have to do more than fiddle with the pacing_shift? > Recently I helped a user that could get barely 70 stations streaming > at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz), > and we got 110 working with a tweaked TCP stack. These were /n > stations too. > > I think it is lame that it _still_ requires out of tree patches to > make TCP work well on ath10k...even if you want to default to current > behaviour, you should allow users to tweak it to work with their use > case. Well if TCP is broken to the point of being unusable I do think we should fix it; but I think "just provide a configuration knob" should be the last resort... -Toke