On 14 February 2018 01:43:25 CET, Ryan Hsu <ryanhsu@xxxxxxxxxxxxxxxx> wrote: >On 02/02/2018 07:11 AM, Toke Høiland-Jørgensen wrote: > >> Since we now have the convenient helper to do so, actually adjust the >> TSQ pacing shift for packets going out over a WiFi interface. This >> significantly improves throughput for locally-originated TCP >> connections. The default pacing shift of 10 corresponds to ~1ms of >> queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) >improves >> 1-hop throughput for ath9k by a factor of 3, whereas increasing it >more >> has diminishing returns. >> >> Achieved throughput for different values of sk_pacing_shift (average >of >> 5 iterations of 10-sec netperf runs to a host on the other side of >the >> WiFi hop): >> >> sk_pacing_shift 10: 43.21 Mbps (pre-patch) >> sk_pacing_shift 9: 78.17 Mbps >> sk_pacing_shift 8: 123.94 Mbps >> sk_pacing_shift 7: 128.31 Mbps >> >> Latency for competing flows increases from ~3 ms to ~10 ms with this >> change. This is about the same magnitude of queueing latency induced >by >> flows that are not originated on the WiFi device itself (and so are >not >> limited by TSQ). >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxx> >> --- >> net/mac80211/tx.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c >> index 25904af38839..69722504e3e1 100644 >> --- a/net/mac80211/tx.c >> +++ b/net/mac80211/tx.c >> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct >sk_buff *skb, >> if (!IS_ERR_OR_NULL(sta)) { >> struct ieee80211_fast_tx *fast_tx; >> >> + /* We need a bit of data queued to build aggregates properly, so >> + * instruct the TCP stack to allow more than a single ms of data >> + * to be queued in the stack. The value is a bit-shift of 1 >> + * second, so 8 is ~4ms of queued data. Only affects local TCP >> + * sockets. >> + */ >> + sk_pacing_shift_update(skb->sk, 8); >> + >> fast_tx = rcu_dereference(sta->fast_tx); >> >> if (fast_tx && > >I knew increasing the value doesn't help much after 8 for ath9k, but I >ran a >testing on ath10k that 6 or 7 is having optimal number. >Since ath10k/11ac device has higher bandwidth than ath9k/11n, can we >consider >to use to 6 or 7 to accommodate that effect? > > tx (mbps) cpu usage (%) >5 404 28.5 >6 398 13.8 >7 401 8 >8 378 5 >9 230 4.5 >10 79.6 2 Why does the CPU usage go up >7? Also, what is the latency impact of each of those values? -Toke