Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

Toke Høiland-Jørgensen <toke@xxxxxxx> · Thu, 21 Feb 2019 17:37:53 +0100

Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:

> On 2/21/19 8:10 AM, Kalle Valo wrote:
>> Toke Høiland-Jørgensen <toke@xxxxxxx> writes:
>> 
>>> Grant Grundler <grundler@xxxxxxxxxx> writes:
>>>
>>>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote:
>>>>>
>>>>> Grant Grundler <grundler@xxxxxxxxxx> writes:
>>>>>
>>>>>>> And, well, Grant's data is from a single test in a noisy
>>>>>>> environment where the time series graph shows that throughput is all over
>>>>>>> the place for the duration of the test; so it's hard to draw solid
>>>>>>> conclusions from (for instance, for the 5-stream test, the average
>>>>>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>>>>>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>>>>>>> used in this test, so I can't go verify it myself; so the only thing I
>>>>>>> can do is grumble about it here... :)
>>>>>>
>>>>>> It's a fair complaint and I agree with it. My counter argument is the
>>>>>> opposite is true too: most ideal benchmarks don't measure what most
>>>>>> users see. While the data wgong provided are way more noisy than I
>>>>>> like, my overall "confidence" in the "conclusion" I offered is still
>>>>>> positive.
>>>>>
>>>>> Right. I guess I would just prefer a slightly more comprehensive
>>>>> evaluation to base a 4x increase in buffer size on...
>>>>
>>>> Kalle, is this why you didn't accept this patch? Other reasons?
>>>>
>>>> Toke, what else would you like to see evaluated?
>>>>
>>>> I generally want to see three things measured when "benchmarking"
>>>> technologies: throughput, latency, cpu utilization
>>>> We've covered those three I think "reasonably".
>>>
>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>> patch), I think I had two main concerns:
>>>
>>> 1. What happens in a degraded signal situation, where the throughput is
>>>     limited by the signal conditions, or by contention with other devices.
>>>     Both of these happen regularly, and I worry that latency will be
>>>     badly affected under those conditions.
>>>
>>> 2. What happens with old hardware that has worse buffer management in
>>>     the driver->firmware path (especially drivers without push/pull mode
>>>     support)? For these, the lower-level queueing structure is less
>>>     effective at controlling queueing latency.
>> 
>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>> PCI devices, which IIRC do not even support push/pull mode. All the
>> rest, including QCA988X and QCA9984 are unaffected.
>
> Just as a note, at least kernels such as 4.14.whatever perform poorly when
> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
> really usable for stuff like serving video to lots of clients.
>
> Tweaking TCP (I do it a bit differently, but either way) can significantly
> improve performance.

Differently how? Did you have to do more than fiddle with the pacing_shift?

> Recently I helped a user that could get barely 70 stations streaming
> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
> and we got 110 working with a tweaked TCP stack. These were /n
> stations too.
>
> I think it is lame that it _still_ requires out of tree patches to
> make TCP work well on ath10k...even if you want to default to current
> behaviour, you should allow users to tweak it to work with their use
> case.

Well if TCP is broken to the point of being unusable I do think we
should fix it; but I think "just provide a configuration knob" should be
the last resort...

-Toke