W/RTT verification, linux tcp buffers behaviour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here's my problem:

I am trying to verify the formula :

W/RTT = Max Throughput

between two end-hosts belonging to the same private network.

Where:

RTT stands for Round Trip Time
W = min(CWND, AW, SNDBUF)
CWND : the size of the congestion window
AW : the size of the receiver's advertized window
SNDBUF : the size of the send buffer

In order to do this I have made various bandwidth measurements between the
two hosts. More particularly I have fixed the receiver's send buffer
to 16 MBytes whereas I have made the sender's buffer vary between 8 KBytes
and 16 MBytes.

Both hosts, as it can be seen below, are good machines which are linked
to the private network through Gigabit Ethernet.

Hosts' configuration:
--------------------------

Debian Linux 2.6.12-1-amd64-k8-smp
AMD Opteron 246/248
2GB RAM
80 GB HDD
Gigabit Ethernet
Tcp specific options that are set via sysctl can be found at the end
of this letter.


As for the network itself it appears to be of excellent quality since
during the whole experiments no retransmitted is reported and the RTT
ranges between 12 and 13 milliseconds.

Normally, one shouldn't expect to approach very closely W/RTT but given
the quality of both network (no losses and very stable RTT) and end hosts
it is surprising to get at best only 70 % of W/RTT (see below for results).

Bandwidth is measured with Iperf tool
Tcp buffer sizes are set with Iperf tool (via setsockopt() )
Traffic is dumped with tcpdump on both end hosts
Traffic statistics from tcpdump traces are provided by tcptrace tool

The tcpdump's traces which are made for each transfer confirm network quality.

Here are some figures:

RTT    SNDBUF  RCVBUF MAX SND  MAX AW      Iperf   W/RTT       %
----------------------------------------------------------------------------------------
12,7   8             16384    8               6293248     3,57    5,16
       69,18
12,7   16           16384    10,76        6293248     6,9      10,32       66,85
12,7   32           16384    21,4          6293248     13,7    20,64       66,37
12,7   64           16384    31,5          6293248     26,7    41,28       64,67
12,7   128         16384    49,5          6293248     54,4    82,56       65,88
12,7   256         16384    213           6293248     105     165,13     63,58
12,7   512         16384    266           6293248     171     330,26     51,77
12,7   1024       16384    -                6293248     382     660,52     57,83
12,7   2048       16384    -                6293248     673     1321,04   50,94
12,7   4096       16384    -                6293248     905     2642,08   34,25

RTT            : round trip time
                 (milliseconds)
SNDBUF   : size of tcp send buffer
        (KBytes)
RCVBUF    : size of tcp receive buffer
        (KBytes)
MAX SND : the average amount of data send per RTT               (KBytes)
                  MAX SND is estimated from the tcpdump traces.
MAX AW    : maximum size of the advertized window                 (KBytes)
                  provided by tcdump's traces
Iperf          : Throughput reported by Iperf tool
      (Mbits/sec)
W/RTT        : Max Throughput reachable
     (Mbits/sec)

Even though only the maximum size of the advertized window is reported,
actually the size of the advertized window grows in a few RTT greater
than SNDBUF, thus I assumed safe to take W = min(CWND, SNDBUF) and since no
retransmissions are detected, the CWND grows beyond the size of SNDBUF and so
I took W =SNDBUF to compute W/RTT.

As it can be seen, we hardly reach 70 % of the value predicted by the formula
and apparently it seems that it is due to the fact that MAX SND
remains relatively
low compared to SNDBUF.

Hereafter lie some questions.

Questions:
--------------

1) Am I missing or misunderstanding something ?
2) Do you have any other ideas which could explain the low percentage reached ?
3) Supposing the low percentage is really due to the fact that
sender's buffer isn't
  fully used, why isn't it used to its fullest ?
  Is there some way to overcome this ?

Misc Questions:
--------------------

i.e.: questions I tried to answer myself by searching around the
internet but for which I didn't find any satisfactory answer or any
answer at all.

4) Why is the advertized window steadily growing until it reaches 6
MBytes instead of being given directly a size of 6 Mbytes at the
beginning of the connection ?
5) Why does the advertized window remain stuck at 6 MBytes ?
6) Why does the kernel allocate twice the size of the buffer size
requested by setsockopt ?

Thank you in advance,

Constantinos


###################
#           /etc/sysctl.conf            #
###################

# I mainly disabled ecn, fack, dsack,autotuning
# Left rfc1323 as well as sack enabled
# Left only TCP Reno (i.e.: disabled bictcp, vegas, ...)

net/ipv4/tcp_tso_win_divisor=8
net/ipv4/tcp_moderate_rcvbuf=0
net/ipv4/tcp_bic=0
net/ipv4/tcp_vegas_cong_avoid=0
net/ipv4/tcp_westwood=0
net/ipv4/tcp_no_metrics_save=0
net/ipv4/tcp_low_latency=0
net/ipv4/tcp_frto=0
net/ipv4/tcp_tw_reuse=0
net/ipv4/tcp_adv_win_scale=2
net/ipv4/tcp_app_win=31
net/ipv4/tcp_dsack=0
net/ipv4/tcp_ecn=0
net/ipv4/tcp_reordering=3
net/ipv4/tcp_fack=0
net/ipv4/tcp_orphan_retries=0
net/ipv4/tcp_max_syn_backlog=1024
net/ipv4/tcp_rfc1337=0
net/ipv4/tcp_stdurg=0
net/ipv4/tcp_abort_on_overflow=0
net/ipv4/tcp_tw_recycle=0
net/ipv4/tcp_syncookies=0
net/ipv4/tcp_fin_timeout=60
net/ipv4/tcp_retries2=15
net/ipv4/tcp_retries1=3
net/ipv4/tcp_keepalive_intvl=75
net/ipv4/tcp_keepalive_probes=9
net/ipv4/tcp_keepalive_time=7200
net/ipv4/tcp_max_tw_buckets=180000
net/ipv4/tcp_max_orphans=65536
net/ipv4/tcp_synack_retries=5
net/ipv4/tcp_syn_retries=5
net/ipv4/tcp_retrans_collapse=1
net/ipv4/tcp_sack=1
net/ipv4/tcp_window_scaling=1
net/ipv4/tcp_timestamps=1
net/core/rmem_default=8388608
net/core/rmem_max=8388608
net/core/wmem_default=8388608
net/core/wmem_max=8388608
net/ipv4/ip_no_pmtu_disc=0
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux