On Fri, 14 Nov 2003, Ted Duffy wrote: > - Initially, the TCP1 application and TCP2 application send lots of > little messages back and forth. Due to piggy-back acks and low > latency, TCP1 never observes packets_in_flight greater than its > sender congestion window, so its sender congestion window stays > small. > > - The TCP1 application then sends lots of little messages to the TCP2 > application, using nonagle. Due to the small congestion window, > TCP1 only sends several packets before waiting for an ack. > > - The TCP2 application will not send any data back until it receives > many of the little messages, so TCP2 delays using the delayed ack > mechanism, eventually acking the first couple of little packets. > > - TCP1 gets the ack, increases its sender congestion window, and sends > the rest of the TCP1 application's little data messages (which > usually have been merged into a single packet). > > - The TCP2 application now sends lots of large messages. > > - Repeat the previous 4 steps. > > By removing that one conditional in tcp_ack() mentioned above, the > sender congestion window in TCP1 is immediately increased much above > the default 2 packets in the initial step, resulting in no delayed-ack > delays. How many little messages is this? I think this is really caused by the fact that delayed ack is determined by bytes, but cwnd is in packets. If you have the application source, you could set TCP_CORK while writing the small messages, or TCP_QUICKACK on the receiver. What I don't understand is why you don't run into this problem with GigE. Could you get some tcpdumps? Does cwnd somehow get ramped up during Step 1? I think that that test does not belong there, anyway. Could the author comment? > After reading over RFC 2581, and other TCP-related RFCs (793, 2861, > 1323, 1337), I can find no explanation for the limit being placed on > the execution of the congestion avoidance algorithm as is currently > done by that one conditional in tcp_ack(). My interpretation of RFC > 2581 is that during slow start (the initial congestion avoidance > algorithm state), the sender congestion window should be increased by > a packet at every received ack. This should continue until congestion > is observed or the sender congestion window exceeds the slow start > threshold, at which point the congestion avoidance algorithm should > enter the congestion avoidance state. Not entirely correct. Look at RFC 2861 (cwnd validation) which Linux implements. -John - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html