Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: > On 2/21/19 9:15 AM, Toke Høiland-Jørgensen wrote: >> Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: >> >>> On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote: >>>> Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: >>>> >>>>> On 2/21/19 8:10 AM, Kalle Valo wrote: >>>>>> Toke Høiland-Jørgensen <toke@xxxxxxx> writes: >>>>>> >>>>>>> Grant Grundler <grundler@xxxxxxxxxx> writes: >>>>>>> >>>>>>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote: >>>>>>>>> >>>>>>>>> Grant Grundler <grundler@xxxxxxxxxx> writes: >>>>>>>>> >>>>>>>>>>> And, well, Grant's data is from a single test in a noisy >>>>>>>>>>> environment where the time series graph shows that throughput is all over >>>>>>>>>>> the place for the duration of the test; so it's hard to draw solid >>>>>>>>>>> conclusions from (for instance, for the 5-stream test, the average >>>>>>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7 >>>>>>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware >>>>>>>>>>> used in this test, so I can't go verify it myself; so the only thing I >>>>>>>>>>> can do is grumble about it here... :) >>>>>>>>>> >>>>>>>>>> It's a fair complaint and I agree with it. My counter argument is the >>>>>>>>>> opposite is true too: most ideal benchmarks don't measure what most >>>>>>>>>> users see. While the data wgong provided are way more noisy than I >>>>>>>>>> like, my overall "confidence" in the "conclusion" I offered is still >>>>>>>>>> positive. >>>>>>>>> >>>>>>>>> Right. I guess I would just prefer a slightly more comprehensive >>>>>>>>> evaluation to base a 4x increase in buffer size on... >>>>>>>> >>>>>>>> Kalle, is this why you didn't accept this patch? Other reasons? >>>>>>>> >>>>>>>> Toke, what else would you like to see evaluated? >>>>>>>> >>>>>>>> I generally want to see three things measured when "benchmarking" >>>>>>>> technologies: throughput, latency, cpu utilization >>>>>>>> We've covered those three I think "reasonably". >>>>>>> >>>>>>> Hmm, going back and looking at this (I'd completely forgotten about this >>>>>>> patch), I think I had two main concerns: >>>>>>> >>>>>>> 1. What happens in a degraded signal situation, where the throughput is >>>>>>> limited by the signal conditions, or by contention with other devices. >>>>>>> Both of these happen regularly, and I worry that latency will be >>>>>>> badly affected under those conditions. >>>>>>> >>>>>>> 2. What happens with old hardware that has worse buffer management in >>>>>>> the driver->firmware path (especially drivers without push/pull mode >>>>>>> support)? For these, the lower-level queueing structure is less >>>>>>> effective at controlling queueing latency. >>>>>> >>>>>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377 >>>>>> PCI devices, which IIRC do not even support push/pull mode. All the >>>>>> rest, including QCA988X and QCA9984 are unaffected. >>>>> >>>>> Just as a note, at least kernels such as 4.14.whatever perform poorly when >>>>> running ath10k on 9984 when acting as TCP endpoints. This makes them not >>>>> really usable for stuff like serving video to lots of clients. >>>>> >>>>> Tweaking TCP (I do it a bit differently, but either way) can significantly >>>>> improve performance. >>>> >>>> Differently how? Did you have to do more than fiddle with the pacing_shift? >>> >>> This one, or a slightly tweaked version that applies to different kernels: >>> >>> https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5 >> >> Right; but the current mac80211 default (pacing shift 8) corresponds to >> setting your sysctl to 4... >> >>>>> Recently I helped a user that could get barely 70 stations streaming >>>>> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz), >>>>> and we got 110 working with a tweaked TCP stack. These were /n >>>>> stations too. >>>>> >>>>> I think it is lame that it _still_ requires out of tree patches to >>>>> make TCP work well on ath10k...even if you want to default to current >>>>> behaviour, you should allow users to tweak it to work with their use >>>>> case. >>>> >>>> Well if TCP is broken to the point of being unusable I do think we >>>> should fix it; but I think "just provide a configuration knob" should be >>>> the last resort... >>> >>> So, it has been broken for years, and waiting for a perfect solution >>> has not gotten the problem fixed. >> >> Well, the current default should at least be closer to something that >> works well. >> >> I do think I may have erred on the wrong side of the optimum when I >> submitted the original patch to set the default to 8; that should >> probably have been 7 (i.e., 8 ms; the optimum in the evaluation we did >> was around 6 ms, which is sadly not a power of two). Maybe changing that >> default is actually better than having to redo the testing for all the >> different devices as we're discussing in the context of this patch. >> Maybe I should just send a patch to do that... > > It is hubris to think one setting works well for everyone. Sure, set a > good default, but also let people tune the value. Well I certainly won't object to a knob; I just don't think that most people are going to use it, so we better make the default reasonable. > And send the patches to stable so that users on older kernels can have > good performance. Sure, I can submit stable backports if needed :) -Toke