Re: [RFC] ath10k: implement dql for htt tx

Michal Kazior <michal.kazior@xxxxxxxxx> · Fri, 1 Apr 2016 10:01:24 +0200

Re-posting text only as it was blocked by most mailing list servers:

The original attachment can be fetched at:
  http://kazikcz.github.io/dl/2016-04-01-flent-ath10k-dql.tar.gz

On 25 March 2016 at 10:55, Michal Kazior <michal.kazior@xxxxxxxxx> wrote:
> On 25 March 2016 at 10:39, Michal Kazior <michal.kazior@xxxxxxxxx> wrote:
>> This implements a very naive dynamic queue limits
>> on the flat HTT Tx. In some of my tests (using
>> flent) it seems to reduce induced latency by
>> orders of magnitude (e.g. when enforcing 6mbps
>> tx rates 2500ms -> 150ms). But at the same time it
>> introduces TCP throughput buildup over time
>> (instead of immediate bump to max). More
>> importantly I didn't observe it to make things
>> much worse (yet).
>>
>> Signed-off-by: Michal Kazior <michal.kazior@xxxxxxxxx>
>> ---
>>
>> I'm not sure yet if it's worth to consider this
>> patch for merging per se. My motivation was to
>> have something to prove mac80211 fq works and to
>> see if DQL can learn the proper queue limit in
>> face of wireless rate control at all.
>>
>> I'll do a follow up post with flent test results
>> and some notes.
>
> Here's a short description what-is-what test naming:
>  - sw/fq contains only txq/flow stuff (no scheduling, no txop queue limits)
>  - sw/ath10k_dql contains only ath10k patch which applies DQL to
> driver-firmware tx queue naively
>  - sw/fq+ath10k_dql is obvious
>  - sw/base today's ath.git/master checkout used as base
>  - "veryfast" tests TCP tput to reference receiver (4 antennas)
>  - "fast" tests TCP tput to ref receiver (1 antenna)
>  - "slow" tests TCP tput to ref receiver (1 *unplugged* antenna)
>  - "fast+slow" tests sharing between "fast" and "slow"
>  - "autorate" uses default rate control
>  - "rate6m" uses fixed-tx-rate at 6mbps
>  - the test uses QCA9880 w/ 10.1.467
>  - no rrul tests, sorry Dave! :)
>
> \
> Observations / conclusions:
>  - DQL builds up throughput slowly on "veryfast"; in some tests it
> doesn't get to reach peak (roughly 210mbps average) because the test
> is too short
>
>  - DQL shows better latency results in almost all cases compared to
> the txop based scheduling from my mac80211 RFC (but i haven't
> thoroughly looked at *all* the data; I might've missed a case where it
> performs worse)
>
>  - latency improvement seen on sw/ath10k_dql @ rate6m,fast compared to
> sw/base (1800ms -> 160ms) can be explained by the fact that txq AC
> limit is 256 and since all TCP streams run on BE (and fq_codel as the
> qdisc) the induced txq latency is 256 * (1500 / (6*1024*1024/8.)) / 4
> = ~122ms which is pretty close to the test data (the formula ignores
> MAC overhead, so the latency in practice is larger). Once you consider
> the overhead and in-flight packets on driver-firmware tx queue 160ms
> doesn't seem strange. Moreover when you compare the same case with
> sw/fq+ath10k_dql you can clearly see the advantage of having fq_codel
> in mac80211 software queuing - the latency drops by (another) order of
> magnitude because now incomming ICMPs are treated as new, bursty flows
> and get fed to the device quickly.
>
>  - slow+fast case still sucks but that's expected because DQL hasn't
> been applied per-station
>
>  - sw/fq has lower peak throughput ("veryfast") compared to sw/base
> (this actually proves current - and very young least to say - ath10k
> wake-tx-queue implementation is deficient; ath10k_dql improves it and
> sw/fq+ath10k_dql climbs up to the max throughput over time)
>
>
> To sum things up:
>  - DQL might be able to replace the explicit txop queue limiting
> (which requires rate control info)
>  - mac80211 fair queuing works
>
>
> A few plots for quick and easy reference:
>
>   http://imgur.com/a/TnvbQ

Michał
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html