Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

Ben Greear <greearb@xxxxxxxxxxxxxxx> · Thu, 21 Feb 2019 08:57:51 -0800

On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:
Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:

On 2/21/19 8:10 AM, Kalle Valo wrote:
Toke Høiland-Jørgensen <toke@xxxxxxx> writes:

Grant Grundler <grundler@xxxxxxxxxx> writes:

On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote:

Grant Grundler <grundler@xxxxxxxxxx> writes:

And, well, Grant's data is from a single test in a noisy
environment where the time series graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)

It's a fair complaint and I agree with it. My counter argument is the
opposite is true too: most ideal benchmarks don't measure what most
users see. While the data wgong provided are way more noisy than I
like, my overall "confidence" in the "conclusion" I offered is still
positive.

Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...

Kalle, is this why you didn't accept this patch? Other reasons?

Toke, what else would you like to see evaluated?

I generally want to see three things measured when "benchmarking"
technologies: throughput, latency, cpu utilization
We've covered those three I think "reasonably".

Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
     limited by the signal conditions, or by contention with other devices.
     Both of these happen regularly, and I worry that latency will be
     badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
     the driver->firmware path (especially drivers without push/pull mode
     support)? For these, the lower-level queueing structure is less
     effective at controlling queueing latency.

Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
PCI devices, which IIRC do not even support push/pull mode. All the
rest, including QCA988X and QCA9984 are unaffected.

Just as a note, at least kernels such as 4.14.whatever perform poorly when
running ath10k on 9984 when acting as TCP endpoints.  This makes them not
really usable for stuff like serving video to lots of clients.

Tweaking TCP (I do it a bit differently, but either way) can significantly
improve performance.

Differently how? Did you have to do more than fiddle with the pacing_shift?

This one, or a slightly tweaked version that applies to different kernels:

https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5

Recently I helped a user that could get barely 70 stations streaming
at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
and we got 110 working with a tweaked TCP stack. These were /n
stations too.

I think it is lame that it _still_ requires out of tree patches to
make TCP work well on ath10k...even if you want to default to current
behaviour, you should allow users to tweak it to work with their use
case.

Well if TCP is broken to the point of being unusable I do think we
should fix it; but I think "just provide a configuration knob" should be
the last resort...

So, it has been broken for years, and waiting for a perfect solution has not
gotten the problem fixed.

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com