> I've got a dedicated 1000Mbps link between two sites with a rtt of 7ms, > which seems to be dropping about 1 in 20000 packets (MTU of 1500 bytes). > I've got identical boxes at either end of the link running 2.6.27 > (e1000e 0.3.3.3-k6), and I've been trying to saturate the link with TCP > transfers in spite of the packet loss. Why? > I can chuck UDP at near-linespeed over the link (/dev/zero + nc), which > seems to almost saturate it at 920Mbps. However, TCP throughput of a > single stream (/dev/zero + nc) averages about 150Mbps. Looking at the > tcptrace time sequence graphs of a capture, the TCP window averages out > at about 3MB - although after an initial exponential ramp up, the moment > the sender realises a packet is lost, the throughput appears to be > clamped to only use about ~5% of the available window. I assume this is > the congestion control algorithm at the sender applying a congestion > window. No, not really, per se. TCP sends packets until the Tx window is full. The Rx host receives the packets and assembles them in order. It sends an ACK pointing to the highest numbered packet in the successfully assembled stream, saving but ignoring any out-of-sequence packets. Thus, if the receiving host gets the first 12 and the last 6 out of 20 packets, it sends an ACK for packet #12, and then just waits. Having received an ACK for #12, the Tx host moves the start of the window to packet #13, and transmits the remaining packets up to the end of the window. It then sits and waits for an additional ACK. Since packets #13 and #14 never reached the Rx host, it also simply waits, keeping packets #15 through the end of the window, and both hosts sit idle. After an implementation dependent wait period (usually about 2 seconds), the Tx host starts re-sending the entire window contents, which in this case starts with packet #13. After receiving packet #13 and #14, the Rx host now has the entire window fully assembled, so it sends an ACK for the last packet in the window. Meanwhile, it may be likely the Tx host has sent some number of packets beyond #14. These will be received by the Rx host and discarded. Upon receiving an ACK for the last packet in the window, the Tx host discards the entire contents of the window and loads a whole new window, transmitting as much of the window as it can before receiving an ACK for some packet in the window. Having received the ACK, it again moves the window up. If the latency is low compared to the bandwidth of the media and the window size, the Tx host will never pause, since the window is moved forward before it is done being transmitted. If a re-transmit is required, then TCP does adjust the window size to accommodate what it presumes is congestion on the link. It also never starts out streaming at full bandwidth. It continually adjusts its window size upwards until it encounters what it interprets as congestion issues, or the maximum window size supported by the two hosts. > I've tried increasing the network buffer sizing at both ends... That won't help. > What else should I be doing to crank up the throughput and defeat the > congestion control? Why would you be trying to do this? It is true TCP works well with congested links, but not so well with links suffering random errors. You aren't going to be successful in breaking the TCP handshaking parameters without breaking TCP itself. TCP guarantees delivery of the packets to the application layer intact and in order. The behavior of TCP on a dirty link is an artifact of that requirement. If you want to deliver at full speed, use UDP, and have the application layer handle lost packets. If you did not write the application (or have a developer do it for you), and it does not support UDP transfers, then there is nothing you can do about it. > Could jumbo frames help? No. If anything, they may make it worse. Noisy links call for small frames. > Is there an equivalent > congestion control module to RUDE_TCP > (http://www.weedissent.net/code/rude_tcp-2.4.20.patch) for 2.6, or is I don't know, but RUDE_TCP is specifically developed for high quality networks, not noisy ones. > this a Bad Idea? Yes. RUDE_TCP notwithstanding, there are various other ways to guarantee data delivery than that used by TCP, and each method has its own strengths and drawbacks. No matter what transfer protocol is implemented, however, guaranteeing delivery of a stream segment requires the entire segment be assembled completely at the Rx host before moving on. Consequently, this means the process must halt in some fashion before continuing after the entire segment has been transmitted until such time as the Tx host receives notification the entire segment was received intact. This places an upper limit on the overall transmission rate directly proportional to the size of the Rx buffer. This can be done at the application layer, or it can be done at some other layer, in this case TCP. Handling an expectedly noisy link is definitely best done through some other protocol than TCP, assuming such flexibility is available. Of course, if data integrity is not critical, then UDP is a better choice in any case. That's why VOIP, NTP, etc, employ UDP. -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html