I'm not sure what you are describing is the problem I'm seeing. Let me recap the configuration & ask if reordering SHOULD occur [and trigger the congestion detection]. System 1 System 2 dual CPU's dual CPU's NIC 1 ---- switch1 ---- NIC 1 (eth0) NIC 2 ---- switch2 ---- NIC 2 (eth1) The "bond0" interface has both eth0 and eth1 used for the connections between the two systems. The NIC's, cables & switches are independent. Both switches are private LAN's, no planned activity other than the test except for typical daemon activity for the systems under test. The netpipe program basically exchanges data between the two systems, starting with small block sizes & grows at "interesting" sizes [the default is basically 2^n +/- 1] and records throughput [bytes per second] and latency [1/2 the round trip time]. I don't think it uses multiple streams of data - I will check to be sure. What I see is - the single channel result [prior to channel bonding] is OK, no odd behavior. - the two channel results [with channel bonding] is not OK, severe drop outs. That indicates to me that the basic drivers, and switches are sound. That also indicates to me that the bond0 interface is getting confused. I assume that the channel bonding code splits the large packets & sends about 1/2 the data on both eth0 and eth1 - results from ifconfig tends to confirm [transmits on eth0/eth1 add up to transmits on bond0, byte counts are similar between eth0 and eth1]. - Would the channel bonding code send the packets out of order [to cause the problem you describe]? - I thought about packet drops as well, perhaps an SMP race condition? We've noted that the error counts from ifconfig don't increment until a slight increase at the end of the run, so it's not counting the problems if they do occur. [had 22 overruns recorded after >1.5M packets transmitted & received without any overruns! Overruns on one machine only, not both.] I'll also see if I can get a copy of the latest kernel & see if the problem recurs & get back on that test [tomorrow?]. Thanks. --Mark H Johnson <mailto:Mark_H_Johnson@raytheon.com> "Andi Kleen" <ak@suse.de> To: Mark H Johnson/RTS/Raytheon/US@RTS cc: linux-net@vger.kernel.org 08/23/00 Subject: Re: Performance with ethernet channel bonding 08:29 AM On Tue, Aug 22, 2000 at 11:15:47AM -0500, Mark_H_Johnson@Raytheon.com wrote: > As part of a study for NASA, we ran "netpipe" on both single channel and > two channel bonded Ethernet networks and got some odd results. > - The single channel results looked OK. > - The channel bonded results had some serious performance drops. > I was not able to find a documented problem about this. Is this a known > problem [with a work around?] or if new, what kind of information will be > needed to help isolate it? I've included some sample data about our > configuration and the symptoms below. I will gladly send raw data & more > detailed configuration information if that will help. Please respond to me > directly - I don't subscribe to linux-net & linux-kernel-digest was still > down the last time I've checked. Thanks. The performance drops are probably caused by packet reordering. Extensive packet reordering causes TCP to detect congestion and causes extensive retransmits and lower congestion windows. Linux 2.4.0test7-preLATEST has some sender side improvements to handle reordering in the network better (and some receiver side hacks that may or more likely may not work) The best solution is to avoid reordering by not sending more than a single stream / interface. -Andi - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/