Maybe one of the TCP options is intefering with the out-of-order
reception the receiving end experiences.
Try to disable all options you can and repeat. Research into why/what
each option is there and does. Some options are for the other end of
the performance spectrum, window scaling for example. So they wont
provide any assistance to your situatution.
My guess would be SACK (selective acknowledgement) is causing the
reciving end to signal to the sending to retransmit the (apparently)
lost packets it sees. When in reality these packets are delayed not
lost and it just doesn't know yet. So disable sack on linux try "echo 0
> /proc/sys/net/ipv4/tcp_sack" try this as both ends (but maybe only
your bottlenecked / teql end needs it done)
There is also a delayed ack mechanism that trys to reduce acks flowing
the other way and also add some additional wait to the reception of
marginally delayed data packets so they can be colated before the ack is
send back, maybe the amount of time can be increased to help colation
(providing this is kept within some % of the overall route RTT).
If the receiving end receives multiple ack packets with the same
sequence number it starts to conclude the data just beyond that acked
has gone missing, after 3 in a row sending starts to shut down and the
sending end and spits out another retransmission of what it believes to
be the lost packet. This is how it worked BEFORE SACK became the
default anyway, this is some TCP fast-ack mechanism.
What % of the Round Trip Time does the delay constitute ? You talk of
1ms and 5ms deviation, if you are talking about RTT being ethernet like
speeds then 5ms is a long time. All TCP timings are dynamic around what
the sending side computes the RTT to be as the goal of sending bulk TCP
data is to fill the virtual pipeline between sender and receiver. But
to do this in a way that is co-operative with other users. Lost or
delayed packets are the pricipal indicator the route is congested and
therefore the sending site backs off. If your best RTT is 7ms and worst
12ms you can't expect a few simple options to make much difference.
However if the overall RTT is in the order of 70+ms there maybe plenty
of room to see some improvements with configuration changes.
Can you improve the load balancing at the congested sending end ? For
example have you made sure there is only a single packet transmitted
queue at the interfaces. "ifconfig ppp0 txqueue 1" or some other low
number like 2 or 3. The default looks to be 64 these days, this is too
much if your teql interface also has a queue and the ppp0 interface goes
and asks teql for another packet everytime is has space for one.
Just some pointers for you.
Darryl
Li, Ji wrote:
I am measuring the performance of one TCP connection over two
symmetric paths. Packets are sent to two paths alternatively. I found
that when the latency of each path are within 1ms, the overall TCP
throughput is the *sum* of the throughput of the two paths. However,
when the latency of the two paths increases to 5ms, the overal TCP
throughput drops to the throughput of a *single* path. Has anyone
studied similar problem? What makes the performance go down?
I use Fedora Core 3 and 4, teql and netem for my emulation.
------------------------------------------------------------------------
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc