Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

Toke Høiland-Jørgensen <toke@xxxxxxx> · Thu, 21 Feb 2019 23:50:53 +0100

Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:

> On 2/21/19 9:15 AM, Toke Høiland-Jørgensen wrote:
>> Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
>> 
>>> On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:
>>>> Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
>>>>
>>>>> On 2/21/19 8:10 AM, Kalle Valo wrote:
>>>>>> Toke Høiland-Jørgensen <toke@xxxxxxx> writes:
>>>>>>
>>>>>>> Grant Grundler <grundler@xxxxxxxxxx> writes:
>>>>>>>
>>>>>>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Grant Grundler <grundler@xxxxxxxxxx> writes:
>>>>>>>>>
>>>>>>>>>>> And, well, Grant's data is from a single test in a noisy
>>>>>>>>>>> environment where the time series graph shows that throughput is all over
>>>>>>>>>>> the place for the duration of the test; so it's hard to draw solid
>>>>>>>>>>> conclusions from (for instance, for the 5-stream test, the average
>>>>>>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>>>>>>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>>>>>>>>>>> used in this test, so I can't go verify it myself; so the only thing I
>>>>>>>>>>> can do is grumble about it here... :)
>>>>>>>>>>
>>>>>>>>>> It's a fair complaint and I agree with it. My counter argument is the
>>>>>>>>>> opposite is true too: most ideal benchmarks don't measure what most
>>>>>>>>>> users see. While the data wgong provided are way more noisy than I
>>>>>>>>>> like, my overall "confidence" in the "conclusion" I offered is still
>>>>>>>>>> positive.
>>>>>>>>>
>>>>>>>>> Right. I guess I would just prefer a slightly more comprehensive
>>>>>>>>> evaluation to base a 4x increase in buffer size on...
>>>>>>>>
>>>>>>>> Kalle, is this why you didn't accept this patch? Other reasons?
>>>>>>>>
>>>>>>>> Toke, what else would you like to see evaluated?
>>>>>>>>
>>>>>>>> I generally want to see three things measured when "benchmarking"
>>>>>>>> technologies: throughput, latency, cpu utilization
>>>>>>>> We've covered those three I think "reasonably".
>>>>>>>
>>>>>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>>>>>> patch), I think I had two main concerns:
>>>>>>>
>>>>>>> 1. What happens in a degraded signal situation, where the throughput is
>>>>>>>       limited by the signal conditions, or by contention with other devices.
>>>>>>>       Both of these happen regularly, and I worry that latency will be
>>>>>>>       badly affected under those conditions.
>>>>>>>
>>>>>>> 2. What happens with old hardware that has worse buffer management in
>>>>>>>       the driver->firmware path (especially drivers without push/pull mode
>>>>>>>       support)? For these, the lower-level queueing structure is less
>>>>>>>       effective at controlling queueing latency.
>>>>>>
>>>>>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>>>>>> PCI devices, which IIRC do not even support push/pull mode. All the
>>>>>> rest, including QCA988X and QCA9984 are unaffected.
>>>>>
>>>>> Just as a note, at least kernels such as 4.14.whatever perform poorly when
>>>>> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
>>>>> really usable for stuff like serving video to lots of clients.
>>>>>
>>>>> Tweaking TCP (I do it a bit differently, but either way) can significantly
>>>>> improve performance.
>>>>
>>>> Differently how? Did you have to do more than fiddle with the pacing_shift?
>>>
>>> This one, or a slightly tweaked version that applies to different kernels:
>>>
>>> https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5
>> 
>> Right; but the current mac80211 default (pacing shift 8) corresponds to
>> setting your sysctl to 4...
>> 
>>>>> Recently I helped a user that could get barely 70 stations streaming
>>>>> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
>>>>> and we got 110 working with a tweaked TCP stack. These were /n
>>>>> stations too.
>>>>>
>>>>> I think it is lame that it _still_ requires out of tree patches to
>>>>> make TCP work well on ath10k...even if you want to default to current
>>>>> behaviour, you should allow users to tweak it to work with their use
>>>>> case.
>>>>
>>>> Well if TCP is broken to the point of being unusable I do think we
>>>> should fix it; but I think "just provide a configuration knob" should be
>>>> the last resort...
>>>
>>> So, it has been broken for years, and waiting for a perfect solution
>>> has not gotten the problem fixed.
>> 
>> Well, the current default should at least be closer to something that
>> works well.
>> 
>> I do think I may have erred on the wrong side of the optimum when I
>> submitted the original patch to set the default to 8; that should
>> probably have been 7 (i.e., 8 ms; the optimum in the evaluation we did
>> was around 6 ms, which is sadly not a power of two). Maybe changing that
>> default is actually better than having to redo the testing for all the
>> different devices as we're discussing in the context of this patch.
>> Maybe I should just send a patch to do that...
>
> It is hubris to think one setting works well for everyone. Sure, set a
> good default, but also let people tune the value.

Well I certainly won't object to a knob; I just don't think that most
people are going to use it, so we better make the default reasonable.

> And send the patches to stable so that users on older kernels can have
> good performance.

Sure, I can submit stable backports if needed :)

-Toke