[no subject]

"Leslie Rhorer" <lrhorer@xxxxxxxxxxx> · Sun, 18 Oct 2009 16:58:21 -0500



> I've got a dedicated 1000Mbps link between two sites with a rtt of 7ms,
> which seems to be dropping about 1 in 20000 packets (MTU of 1500 bytes).
>   I've got identical boxes at either end of the link running 2.6.27
> (e1000e 0.3.3.3-k6), and I've been trying to saturate the link with TCP
> transfers in spite of the packet loss.

	Why?

> I can chuck UDP at near-linespeed over the link (/dev/zero + nc), which
> seems to almost saturate it at 920Mbps.  However, TCP throughput of a
> single stream (/dev/zero + nc) averages about 150Mbps.  Looking at the
> tcptrace time sequence graphs of a capture, the TCP window averages out
> at about 3MB - although after an initial exponential ramp up, the moment
> the sender realises a packet is lost, the throughput appears to be
> clamped to only use about ~5% of the available window.  I assume this is
> the congestion control algorithm at the sender applying a congestion
> window.

	No, not really, per se.  TCP sends packets until the Tx window is
full.  The Rx host receives the packets and assembles them in order.  It
sends an ACK pointing to the highest numbered packet in the successfully
assembled stream, saving but ignoring any out-of-sequence packets.  Thus, if
the receiving host gets the first 12 and the last 6 out of 20 packets, it
sends an ACK for packet #12, and then just waits.  Having received an ACK
for #12, the Tx host moves the start of the window to packet #13, and
transmits the remaining packets up to the end of the window.  It then sits
and waits for an additional ACK.  Since packets #13 and #14 never reached
the Rx host, it also simply waits, keeping packets #15 through the end of
the window, and both hosts sit idle.  After an implementation dependent wait
period (usually about 2 seconds), the Tx host starts re-sending the entire
window contents, which in this case starts with packet #13.  After receiving
packet #13 and #14, the Rx host now has the entire window fully assembled,
so it sends an ACK for the last packet in the window.  Meanwhile, it may be
likely the Tx host has sent some number of packets beyond #14.  These will
be received by the Rx host and discarded.  Upon receiving an ACK for the
last packet in the window, the Tx host discards the entire contents of the
window and loads a whole new window, transmitting as much of the window as
it can before receiving an ACK for some packet in the window.  Having
received the ACK, it again moves the window up.  If the latency is low
compared to the bandwidth of the media and the window size, the Tx host will
never pause, since the window is moved forward before it is done being
transmitted.

	If a re-transmit is required, then TCP does adjust the window size
to accommodate what it presumes is congestion on the link.  It also never
starts out streaming at full bandwidth.  It continually adjusts its window
size upwards until it encounters what it interprets as congestion issues, or
the maximum window size supported by the two hosts.


> I've tried increasing the network buffer sizing at both ends...

	That won't help.

> What else should I be doing to crank up the throughput and defeat the
> congestion control?

	Why would you be trying to do this?  It is true TCP works well with
congested links, but not so well with links suffering random errors.  You
aren't going to be successful in breaking the TCP handshaking parameters
without breaking TCP itself.  TCP guarantees delivery of the packets to the
application layer intact and in order.  The behavior of TCP on a dirty link
is an artifact of that requirement.  If you want to deliver at full speed,
use UDP, and have the application layer handle lost packets.  If you did not
write the application (or have a developer do it for you), and it does not
support UDP transfers, then there is nothing you can do about it.

> Could jumbo frames help?

	No.  If anything, they may make it worse.  Noisy links call for
small frames.

> Is there an equivalent
> congestion control module to RUDE_TCP
> (http://www.weedissent.net/code/rude_tcp-2.4.20.patch) for 2.6, or is

	I don't know, but RUDE_TCP is specifically developed for high
quality networks, not noisy ones.

> this a Bad Idea?

	Yes.  RUDE_TCP notwithstanding, there are various other ways to
guarantee data delivery than that used by TCP, and each method has its own
strengths and drawbacks.  No matter what transfer protocol is implemented,
however, guaranteeing delivery of a stream segment requires the entire
segment be assembled completely at the Rx host before moving on.
Consequently, this means the process must halt in some fashion before
continuing after the entire segment has been transmitted until such time as
the Tx host receives notification the entire segment was received intact.
This places an upper limit on the overall transmission rate directly
proportional to the size of the Rx buffer.  This can be done at the
application layer, or it can be done at some other layer, in this case TCP.
Handling an expectedly noisy link is definitely best done through some other
protocol than TCP, assuming such flexibility is available.  Of course, if
data integrity is not critical, then UDP is a better choice in any case.
That's why VOIP, NTP, etc, employ UDP.

--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html